【精品文档】43中英文双语外文文献翻译成品:网络新闻热点话题中文标题用词分析---一个复杂的网络视角_第1页
【精品文档】43中英文双语外文文献翻译成品:网络新闻热点话题中文标题用词分析---一个复杂的网络视角_第2页
【精品文档】43中英文双语外文文献翻译成品:网络新闻热点话题中文标题用词分析---一个复杂的网络视角_第3页
【精品文档】43中英文双语外文文献翻译成品:网络新闻热点话题中文标题用词分析---一个复杂的网络视角_第4页
【精品文档】43中英文双语外文文献翻译成品:网络新闻热点话题中文标题用词分析---一个复杂的网络视角_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

此文档是毕业设计外文翻译成品( 含英文原文+中文翻译),无需调整复杂的格式!下载之后直接可用,方便快捷!本文价格不贵,也就几十块钱!一辈子也就一次的事!外文标题:Words Analysis of Online Chinese News Headlines about Trending Events: A Complex Network Perspective外文作者:Huajiao Li, Wei Fang, Haizhong An, Xuan Huang文献出处:Plos One , 2018 , 10 (3)(如觉得年份太老,可改为近2年,毕竟很多毕业生都这样做)英文2309单词,13699字符(字符就是印刷符),中文3155汉字。Words Analysis of Online Chinese News Headlines about Trending Events: A Complex Network PerspectiveHuajiao Li, Wei Fang, Haizhong An, Xuan HuangAbstractBecause the volume of information available online is growing at breakneck speed, keeping up with meaning and information communicated by the media and netizens is a new challenge both for scholars and for companies who must address public relations crises. Most current theories and tools are directed at identifying one website or one piece of online news and do not attempt to develop a rapid understanding of all websites and all news covering one topic. This paper represents an effort to integrate statistics, word segmentation, complex networks and visualization to analyze headlines keywords and words relationships in online Chinese news using two samples: the 2011 Bohai Bay oil spill and the 2010 Gulf of Mexico oil spill. We gathered all the news headlines concerning the two trending events in the search results from Baidu, the most popular Chinese search engine. We used Simple Chinese Word Segmentation to segment all the headlines into words and then took words as nodes and considered adjacent relations as edges to construct word networks both using the whole sample and at the monthly level. Finally, we develop an integrated mechanism to analyze the features of words networks based on news headlines that can account for all the keywords in the news about a particular event and therefore track the evolution of news deeply and rapidly.IntroductionWith the development and popularization of information and network technology, the Internet has become the main medium from which people obtain information and news. Helping solve a serious information overload problem 1, search engines are recognized as one of the most useful and popular services on the web 2, 3. Generally, the web (and a search engine) is the first source a person turns to for information or news 4. People have grown accustomed to inputting a few keywords into search engines and then clicking on one or more headlines more people realize that online news plays an important role in the spread of public opinion; thus, it is of great importance to know what and how different news sources present information. A headline is a significant component of the news and not only presents or relates the main points of news content but also must attract and hold the readers attention 9. Some scholars have provided evidence that there are connections among public relations, public awareness and news 10.Method of headlines word segmentationWe used the open source word segmentation software called Simple Chinese Word Segmentation () based on the scripting language PHP. Simple Chinese Word Segmentation employs a dictionary containing more than 260 thousand Chinese words. The part-of-speech tagging used in this software is Peking University annotation, which contains 47 parts of speech. The input information is the headlines and the serial numbers of the headlines, whereas the output information consists of the serial numbers of the words, the words, the words part of speech, and the serial numbers of the headlines.Method of constructing words networkAs described above, the main job of constructing the word network is to determine the nodes and edges as well as the weights of the edges. There are different ways of constructing networks, such as equivalence relationships (complete graph) 30, affiliation relationships (bipartite graph) 33, 42, and so on. In this paper, in order to show the words contextual relationships in the title, we gleaned the segmented words from the news headlines according to the features of the study subject (theme), and then we took each word as a node and connected nodes with edges based on the sequence of the words in the headlines, i.e., the former node as the start node and the node following the former node as the end node. This process was conducted repeatedly and sequentially for all the words in the titles.Fig. 4 shows the linear network for one title. Next, the linear networks of different headlines were superimposed; the weights of the edges are the times of the appearance of the edges between two nodes in different linear networks. Let graph G = (V,E,W) represent the directed weighted network in which V and E are the set of nodes and edges, and W represents the after each occurred, and then faded away to be talked about in the media only occasionally thereafter. Meanwhile, there is one notable difference between the news about the two trending events: the media first reported the 2010 Gulf of Mexico oil spill accident immediately after it occurred but first reported the 2011 Bohai Bay oil spill one month after it had occurred.Results and AnalysisThe topological features of the whole-sample words networkThe visualization of the whole-sample words network. After application of the Simple Chinese Word Segmentation software, we obtained 5,661 words regarding the 2010 Gulf of Mexico oil spill and 6,821 words regarding the 2011 Bohai Bay oil spill (after eliminating punctuations). After cleaning duplicate words, there were 1,288 different words in all the online Chinese news headlines regarding the 2010 Gulf of Mexico oil spill and 1,572 different words in all the online Chinese news headlines regarding the 2011 Bohai Bay oil spill, which means there are 1,288 nodes in the whole-sample words network about Mexico and 1,572 nodes in the whole-sample words network about Bohai. Fig. 5 presents the visualization results of the two whole-sample words networks regarding Mexico and Bohai (the color of the node is determined by the community Id which the node belongs to).Discussion and ConclusionComplex network method has been well used in different empirical areas 44-48. In this paper, we studied an infrequently considered but quite important method for developing a rapid and deep understanding of all the websites and all the news regarding one topic which integrates statistics, word segmentation, complex network theory and visualization to analyze all the online news headlines keywords and their evolution regarding two trending events, the 2010 Gulf of Mexico oil spill and the 2011 Bohai Bay oil spill.We presented an integrated method to analyze both the whole-sample words network and monthly-words network regarding the online news headlines of the two trending events. Through our research, we found that, as with most empirical complex networks, the words networks of online news headlines regarding the two trending events have scale-free characteristics and small-world properties, and the degree assortativity coefficients of the two whole- sample words networks are very low. By calculating the topological features of the nodes, we obtain both the keywords of the whole-sample words network and the keywords of the monthly-words network. Meanwhile, we also obtained the inner relationship and evolution of the words. Compared with the 2010 Gulf of Mexico search engines. If we want to gather information regarding word networks more precisely, we must explore more methods to search the news. Therefore, in the future, we could extend the methods of data searching and try to construct the word networks of the headlines according to reality. Certainly, some of the titles are sensationalized or misleading, which does not reflect the real meaning of the contents of the news; thus, as a next step, we can identify a new method to judge the degree of correlation between the titles and the contents of the news.References1. Chen DB, Wang GN, Zeng A, Fu Y, Zhang YC. Optimizing Online Social Networks for Information Propagation. PloS one 2014; 9: e96614. doi: 10.1371/journal.pone.0096614 PMID: 248168942. Bharat K, BroderA. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems 1998; 30: 379-388.3. Risvik KM, Michelsen R. Search engines and web dynamics. Computer Networks 2002; 39: 289-302.4. Morris MR, Teevan J, Panovich K. A Comparison of Information Seeking Using Search Engines and Social Networks.ICWSM 2010; 10: 23-26.5. QiuT, Zhang ZK, Chen G. Information filtering via a scaling-based function. PloS one 2013; 8: e63531. doi: 10.1371/journal.pone.0063531 PMID: 236968296. Medo M, Zhang YC, Zhou T. Adaptive model for recommendation of news. EPL (Europhysics Letters) 2009; 88:38005.7. Zhang ZK, Liu C. Hybrid recommendation algorithm based on two roles of social tags. International Journal of Bifurcation and Chaos 2012; 22:12501668. Chen D, Zeng A, Cimini G, Zhang YC. Adaptive social recommendation in a multiple category landscape. arXiv preprint2012; arXiv:1210.1441.9. ShieJS. Conceptual metaphoras a news-story promoter: The cases of ENLand EILheadlines. Inter- cultural Pragmatics 2012; 9:1-21.10. Kleinnijenhuis J, Schultz F, Utz S, Oegema D. The mediating role of the news in the BP oil spill crisis 2010: How US news is influenced by public relations and in turn influences public awareness, foreign news, and the share price. Communication Research 2013; 0093650213510940.11. UtzS, Schultz F, GlockaS. Crisis communication online: How medium, crisis type and emotions affected public reactions in the Fukushima Daiichi nucleardisaster. Public Relations Review 2013; 39:4046.12. Mahgoub H, RosnerD, Ismail N, Torkey F. AText Mining Technique Using Association Rules Extrac- tion.International journal of computational intelligence 2008; 4:21-28.13. Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN. MedMiner: an Internet text-mining tool forbiomedical information, with application to gene expression profiling. Biotechniques 1999; 27: 12104. PMID: 1063150014. Choi Y, JungY, Myaeng SH. Identifying controversial issues and theirsub-topics in news articles, In Intelligence and Security Informatics. SpringerBerlin Heidelberg 2010;140-153.15. Balahur A, Steinberger R. Rethinking Sentiment Analysis in the News: fromTheoryto Practice and back. Proceeding of WOMSA 2009; 916. Bhowmick PK. Reader perspective emotion analysis in text through ensemble based multi-label classification framework. Computer and Information Science 2009; 2:64-74.17. Yoon J. Detecting weak signals for long-term business opportunities using text mining of web news. Expert Systems with Applications 2012; 39:12543-12550.18. Huang CJ, Liao JJ, Yang DX, Chang TY, Luo YC. Realization of a news dissemination agent based on weighted association rules and text mining techniques. Expert Systems with Applications 2010; 37: 6409-6413.19. Tanasa D,Trousse B. Advanced data preprocessing for intersites web usage mining. Intelligent Systems, IEEE2004; 19:59-65.20. Li N, Wu DD. Using text mining and sentiment analysis foronline forums hotspotdetection andforecast. Decision Support Systems 2010; 48: 354-368.21. Lin C, Xie R, Guan X, Li L, Li T. Personalized news recommendation via implicitsocial experts. Information Sciences 2014; 254:1-18.22. Wagner H, Dlotko P, MrozekM. Computational topology in text mining, In Computational Topology in Image Context. Springer Berlin Heidelberg 2012; 68-78.Afzal S, Maciejewski R, Jang 丫,曰mqvist N, Ebert DS. Spatial text visualization using automatic typographic maps. lEEETransactions on Visualization&ComputerGraphics 2012; 18: 2556-2564. PMID: 2478326423. Gurkan A, landoli L, Klein M, Zollo G. Mediating debate through on-line large-scale argumentation: Evi- dencefrom the field. Information Sciences 2010; 180: 3686-3702.24. Chen RC, Hsieh CH. Web page classification based on a supportvector machine usingaweighted vote schema. Expert Systems with Applications 2006; 31:427-435.25. Magerman T, Looy BV, Song X. Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 2010; 82: 289-306.26. Dodds PS, Watts DJ, Sabel CF. Information exchange and the robustness of organizational networks. Proceedings ofthe National Academy of Sciences 2003; 100:12516-12521. PMID: 1452800927. Barabasi AL, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T. Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 2002; 311:590-614.28. Hanaki N, Peterhansl A, Dodds PS, Watts DJ. Cooperation in evolving social networks. Management Science 2007; 53:1036-1050.29. Li HJ, An HZ, HuangJC, Gao XY, ShYL. Correlation of the holding behaviourof the holding-based network of Chinese fund management companies based on the node topological characteristics. Acta Phys.Sin.2014; 63:48901-048901.30. Gao X, An H, Zhong W. Features of the Correlation Structure of Price Indices. PLoSone2013; 8: e61091. doi: 10.1371/journal.pone.0061091 PMID: 2359339931. Serrano MA, BogunaM. Topology of the world trade web. Physical ReviewE2003; 68: 015101. PMID: 1293518432. Zhang CJ, Zeng A. Behavior patterns of online users and the effect on information filtering. Physica A: Statistical Mechanics and its Applications 2012; 391:1822-1830.33. Hu H., Wang X. Evolution of a large online social network. Physics LettersA2009; 373:1105-1110.34. Piraveenan M, Prokopenko M, Zomaya A. Assortative mixing in directed biological networks. IEEE/ ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2012; 9: 66-78.35. Newman MEJ. The structure andfunction of complex networks. SIAM review2003; 45:167-256.36. Li H, An H, Gao X, Huang J, Xu Q. On the topological properties of the cross-shareholding networks of listed companies in China: Taking shareholders cross-shareholding relationships into account. Physica A: Statistical Mechanics and its Applications 2014; 406:80-88.37. Brandes U. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 2001; 25:163-177.38. Ebel H, Mielsch LI, Bornholdt S. Scale-free topology of e-mail networks. Physical ReviewE2002; 66: 035103. PMID: 1236617139. Blondel VD, GuillaumeJL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008; 10:10008.40. Palla G, Barabasi A L, VicsekT. Quantifying social group evolution. Nature 2007; 446: 664-667.PMID: 1741017541. Li H, Fang W, An H, Yan L. The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network. PhysicaA: Statistical Mechanics and its Applications 2014; 415:525-532.42. Newman MEJ, Assortative mixing in networks. Physical review letters 2002; 89: 208701. PMID: 1244351543. Qi H, An H, Hao X, Zhong W, Zhang Y. Analyzing the International Exergy Flow Network of Ferrous Metal Ores. PloS one 2014; 9: e106617. doi: 10.1371/journal.pone.0106617 PMID: 2518840744. Hao X, An H, Liu X, Gao X, Cong L. Analysis on main mineral products in international trade. Resources & Industries 2013; 15:354345. An H, Gao X, Fang W, Huang X, Ding Y. The role of fluctuating modes of autocorrelation in crude oil prices. PhysicaA: Statistical Mechanics and its Applications 2014; 393:382-90.46. An H, Zhong W, Chen Y, Li H, Gao X. Features and evolution of international crude oil trade relationships: A trading-based network analysis. Energy 2014; 74: 254-259.An J, An H, Yang G. Relation of financeial institutions and listed mining entities in equity financing based on complex network. Resources & Industries 2014; 16:124-1网络新闻热点话题中文标题用词分析-一个复杂的网络视角Huajiao Li, Wei Fang, Haizhong An, Xuan Huang摘要由于网络上的信息量以是爆炸式的速度进行增长,因此要跟上媒体和网民传达的意思和信息,对于学者和那些必须解决公关危机的公司来说都是一个全新的挑战。当前的大多数理论和工具都是针对某一个网站或者是某一条在线新闻的,而不是去尝试快速了解所有网站和所有涉及同一个主题新闻报道的情况。在本文中,通过2011年渤海湾漏油事件和2010年墨西哥湾漏油事件这两个样本事件,整合统计数据、词的切分、复杂网络环境以及可视化去尝试分析中国在线新闻标题中的关键字和词语的关系。我们搜集了来自中国最受欢迎的搜索引擎-百度搜索结果中关于这两个热点事件的所有新闻头条。我们使用简体中文分词软件将所有标题分割成单词,然后以单词作为节点,以相邻词的关系为边,利用整个样本和每月的用词量去搭建词汇网。最后,基于新闻标题,我们开发了一个综合机制来分析词汇网络的特征,这些新闻标题可以记录关于特定事件的新闻中的所有关键字,并因此可以深入而迅速地追踪新闻的动态发展情况。引言伴随着信息和网络技术的快速发展和普及,互联网已经成为人们获取信息和新闻的主要媒介。在帮助解决严重的信息过载问题1方面,搜索引擎被认为是网络上最有用和最受欢迎的服务之一2,3。通常来说,网络(和搜索引擎)是向人们传递信息或新闻的第一来源4。人们已经习惯于在搜索引擎中输入几个关键词,然后点击一个或多个标题,更多人意识到网络新闻在舆论传播中起着重要的作用。因此,了解不同新闻来源呈现信息的方式就非常重要。标题是新闻的重要组成部分,不仅是提供或关联新闻内容的要点,而且要必须吸引读者的注意力9。有学者已经提供相关证据表明公共关系、公众意识和新闻之间存在关联10。新闻标题的分词方法我们使用了基于脚本语言PHP的开源分词软件-简体中文分词()。 简体中文分词使用词典中超过26万个中文词汇。 本软件中使用的词性标注是北大的注释,其中包含47个词类。 要输入信息是头条新闻的标题和序列号,而输出的信息是由词汇的序号、词汇、词类的词语部分和标题的序列号组成。构建词汇网络的方法如上所述,构建词汇网络的主要工作是确定词汇的节点以及词汇边界的权重。 构建词汇网络有不同的方式,如等价关系(完整图)30、从属关系(二分图)33,42等等。 在本文中,为了显示标题中词汇的上下文关系,我们根据研究主题(标题)的特征从新闻标题中搜集了分词,然后根据标题中词汇的序列,即前一个节点作为起始节点,和前一节点之后的节点作为终止节点,我们将每个词作为节点并将节点与词汇的边界建立联系。图4显示的是一个标题的线性网络。 接下来,我们叠加了不同标题的线性网络; 词与词之间边的权重是在不同线性网络中两个节点之间边的出现次数。 假设图G =(V,E,W)表示有向加权网络,其中V和E是节点和边的集合,W表示其发生之后,然后在媒体中逐渐消失,这在之后会略微谈到。 与此同时,有关这两个热电事件的新闻有一个显着的区别: 在2010年墨西哥湾漏油事故发生后媒体就首次报道这事件,但2011年渤海湾漏油事件是在其发生一个月后媒体再进行报道的。 词汇网络的构建(根据标题)结果与分析全样本词汇网络的拓扑特征全样本词汇网络的可视化。在应用简体中文分词软件后,我们获得了关于2010年墨西哥湾漏油事件的5,661个词汇和关于2011年渤海湾漏油事件(标点符号除外)的6,821词汇。在清理重复词语后,2010年所有在线中文新闻标题中关于2010年墨西哥湾漏油事件以及所有关于2011年渤海湾漏油事件的在线中文新闻标题中一共有1,572个不同词语,这意味着有1,288个节点是关于墨西哥的全样本词网络以及1,572个节点是关于渤海的全样本词网络。图中给出了关于墨西哥和渤海的两个全样本词汇网络的可视化结果(节点的颜色由节点所属的同一ID来确定)。 两个热点事件全样本词汇网络的可视化结果探讨与结论复杂网络法已被很好地用于不同的实证领域44-48。 在本文中,我们研究了一种不常用的但相当重要的方法,用于快速深入地了解所有网站和同一主题的所有新闻,这其中要去整合数据统计、分词、复杂网络理论以及可视化以分析所有在线新闻标题中的关键词及其关于2010年墨西哥湾漏油事件和2011年渤海湾漏油事件两个热点事件的演变趋势。我们提出了一个综合性的方法来分析整个样本词

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论