Java实现词频统计.docx_第1页
Java实现词频统计.docx_第2页
Java实现词频统计.docx_第3页
Java实现词频统计.docx_第4页
全文预览已结束

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

一、 类图和流程图1、 类图总类是Article 包括两个分类,分别是word和wordcompare 2、 流程图创建类总类 输入文章内容 保存文章内容 保存单词集合到数组 同时统计单词的个数 统计相同单词的词频 按词频将词数组和词频数组进行降序排序 按排序输出二、 程序代码import java.util.ArrayList; /引用import java.util.Collections;import java.util.Comparator;import java.util.Iterator;import java.util.List;import java.util.Set;import java.util.TreeSet;public class Article /建立总类Article String content; /保存文章内容 String rawWords; /保存单个单词集合 String words; /保存各个单词对应的词频 int wordFreqs; /输入文章内容 public Article() content = Trusted Computer System (Trusted Computer System) of the U.S. Department + of Defense is a concept first put forward in order to ensure the confidentiality of + computer systems, the U.S. Department of Defense in the 1980s, a set of access control + mechanisms to enhance the credibility of the system And the development of the Trusted + Computer System Evaluation criteria (TCSEC). TCSEC (from the Orange Book to the Rainbow series) + for the information systems of several key components: computer operating systems, databases, + computer network security are put forward a credible evaluation of the safety guidelines. + Norms, from the user log on to empower the management, Access control, audit trails, + hidden channels, the credibility of the computer-systems Road, the establishment of electronic + information systems, safety inspection, protection of the life cycle, text writing, + Users Guide have made regulatory requirements. And in accordance with the security policies + adopted by the system by With the safety features of the system is divided into A, B (BL B2, B3), + C (Cl, C2), D four of the seven-level security. These guidelines for the research-oriented, + standardized production, guiding the user selection of the inspection bodies Based on the evaluation, + all played a role in promoting the good. But the main consideration of security issues in general is + also limited to the confidentiality of information, based on the security model: + Bell & LapadIlia security model developed by the most important security (secrecy): + Strictly on reading, writing under the (no read up no write down) + is the main message for the request for confidentiality. 90 of the four countries + in Western Europe (Britain, France, Germany and the Netherlands) also made information technology + security evaluation criteria (ITSEC). ITSEC (White Paper on Europe) in addition to absorbing + the successful experiences of TCSEC, for the first time the information security of the confidentiality, + integrity and availability of the concept, the credibility of the concept of a computer + to the credibility of information technology onto a high degree of understanding. + Their work European information security has become the foundation of the program, and international + information security research, the implementation of a profound impact. In 1996 the international + community of the six countries (the United States, Canada, Britain, France, Germany and the Netherlands) + jointly put forward a common information technology security evaluation criteria + (CO. CC is based on the European ITSEC, the United States, including the new TCSEC The federal evaluation + criteria, Canadas CTCPEC, as well as the International Organization for Standardization ISO: + SC27 WG3 security evaluation criteria.; public void splitWord() /对文章根据分隔符进行分词,将结果保存到rawWords数组中 final char SPACE = ; /分词的时候,所有的符号全部替换为空格 content = content.replace(, SPACE).replace(, SPACE).replace(., SPACE); content = content.replace(, SPACE).replace(), SPACE).replace(-, SPACE); rawWords = content.split(s+); /凡是空格隔开的都算单词 public void countWordFreq() /统计单词个数 Set set = new TreeSet(); /将所有出现的字符串放入唯一的set中 for(String word: rawWords) set.add(word); Iterator ite = set.iterator(); List wordsList = new ArrayList(); /开辟空间函数 List freqList = new ArrayList(); while(ite.hasNext() String word = (String) ite.next(); int count = 0; /统计相同字符串的个数 for(String str: rawWords) if(str.equals(word) count+; wordsList.add(word); freqList.add(count+); words = wordsList.toArray(new String0); /存入数组当中 wordFreqs = new intfreqList.size(); for(int i = 0; i freqList.size(); i+) wordFreqsi = freqList.get(i); public void sort() /根据词频,将词数组和词频数组进行降序排序 class Word /子类 private String word; private int freq; public Word(String word, int freq) this.word = word; this.freq = freq; class WordComparator implements Comparator /子类 public int compare(Object o1, Object o2) Word word1 = (Word) o1; Word word2 = (Word) o2; if(word1.freq word2.freq) return -1; else int len1 = word1.word.trim().length(); int len2 = word2.word.trim().length(); String min = len1 len2? word2.word: word1.word; String max = len1 len2? word1.word: word2.word; for(int i = 0; i min.length(); i+) if(min.charAt(i) max.charAt(i) return 1; return 1; List wordList = new ArrayList(); for(int i = 0; i words.length; i+) wordList.add(new Word(wordsi, wordFreqsi); Collections.sort(wordList, new WordComparator(); for(int i = 0; i wordList.size(); i+) Word wor = (Word) wordList.get(i); wordsi = wor.word; wordFreqsi = wor.freq; public void printResult() /将排序结果输出System.out

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论