2013美赛特等奖原版论文集12218_第1页
2013美赛特等奖原版论文集12218_第2页
2013美赛特等奖原版论文集12218_第3页
2013美赛特等奖原版论文集12218_第4页
2013美赛特等奖原版论文集12218_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、For office use only T1 _ T2 _ T3 _ T4 _ Team Control Number 12218 Problem Chosen C For office use only F1 _ F2 _ F3 _ F4 _ 2012 Mathematical Contest in Modeling (MCM) Summary Sheet Type a summary of your results on this page. Do not include the name of your school, advisor, or team members on this p

2、age. Social Network Analysis in Crime Busting Social network analysis (SNA) has been winning more and more attention in the last twenty years. The method provides an insight into various networks. In this paper, we use SNA and related techniques to analyze crime data and try to get potential crimina

3、l gang. After first investigating the hidden features of a conspiratorial group, we consider ones closeness to the criminal gang as the main reason in deciding whether a staff member should be regarded as a conspirator. The introduction of cooperation distance metric (CD-metric) combined with the an

4、alysis on cooperation factor makes up our models major base. Within these definitions, potential conspirators can be worked out by choosing those on the top of the ascending CD-metric list. Highest values of CD-metric can be recognized as being the farthest away from the conspiracy, in other words,

5、they are the most innocent ones. A 12 members criminal group is dug out. Before identifying the leading criminals, it is essential for us to quantify the ability of peoples leadership. Centrality analysis suggests a good way in determining the focal point(s) in a network. We make some amendments of

6、the centrality measures so that it can be applied to a directed graph. Whats more, we combine centrality measures(degree, closeness and betweenness) and figure out the fusion values. We can form a ranking list of peoples ability of leadership then. The leaders in the criminal group are sure to come

7、to surface after comparing with the priority list derived before. In the refinement of the network model, we appeal to the theories of semantic network analysis and text analysis. On the purpose of making a deeper exploration of identifying the intrinsic character of one topic, we reapply the centra

8、lity analysis to data computing right after the construction of a semantic web. Based on text analysis approach, we build up a vector space model and compare the final outputs with ones obtained above, the results are robust. This models application in other areas such as the biomedical domain has b

9、een discussed at the end of this paper, excellent performance happens when accessed to enough amounts of data. With a well-organized structure and a wide range of application, this methodology is sure to have a bright future. Keywords: Social Network Analysis, Centrality Measures, Semantic Web, Text

10、 Analysis Team # 12218Page 1 of 18 Contents 1Introduction2 2Investigation Scenario Based on Crime Data Mining Model2 2.1General Assumptions . . . . . . . . . . . . . . . . . . . . . . . . .2 2.2 Defi nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2.3Explanation of the Model .

11、 . . . . . . . . . . . . . . . . . . . . .3 2.4 Tests and Verifi cations . . . . . . . . . . . . . . . . . . . . . . . .4 3Analysis on the Current Case6 3.1Priority Lists of Potential Conspirators . . . . . . . . . . . . . . .6 3.2Performance with More Accurate Information . . . . . . . . . . .7 4Se

12、arching for Leaders7 4.1Centrality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .8 4.2Interpretation of Results . . . . . . . . . . . . . . . . . . . . . . .10 5Model Optimizing11 5.1Semantic Network Analysis . . . . . . . . . . . . . . . . . . . . .11 5.1.1Construction of Semantic Net

13、work . . . . . . . . . . . . .12 5.1.2Topic Selection . . . . . . . . . . . . . . . . . . . . . . . .12 5.2Text Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 5.2.1Vector Space Model. . . . . . . . . . . . . . . . . . . . .13 5.2.4Corroboration of Model Reasonableness . . . . .

14、 . . . . .14 6Applications15 7Conclusion15 1 Team # 12218Page 2 of 18 1Introduction A crime is consist of a wide range of activities ranging from traffi c violations to organized fraud in fi nancial fi eld. The major challenge lying ahead of all the authorities is how to accurately and effi ciently

15、analyze tremendous amount of the crime data 1. With great advance in the information technology, the fi nancial crime hap- pens more often than ever. Due to a German report published not long before, the identifi ed fraud cases related to card payments have risen 345% just from 2007 to 2009 2. A com

16、mon analytical approach to this issue during recen- t years is called “Social Network Analysis”(SNA), which can be viewed as an innovation on mathematical perspective combined with sociology and crimi- nology.This method can expose the hidden structural patterns in criminal networks 3, which can be

17、off ered to the anti-crime departments in directing them to necessary surveillance or interrogation scenarios, after several steps of data processing. According to Menas book, SNA is a data mining technique that reveals the structure and content of a body of information by representing it as a set o

18、f interconnected, linked objects or entities 4. To further present the application of the SNA approach, we arrange paper as followed: i. In section 2, we propose a way to construct a model in identifying potential conspirators with the help from SNA in the fi rst place; ii. In section 3, verifi ed b

19、y the results of in small network, our model is applied to a bigger one and we rank all the people according to their possibility of being involved in the conspiracy; iii. In section 4, based on the centrality analysis theory, we explore a way in determining all the criminal leaders; iv. In section

20、5, after referring to the powerful techniquessemantic network analysis and text analysis, we make some amendments of our model by considering the content and context of the message traffi c; v. At last, we discuss the models further developments when provided with enormous amount of data and talk ab

21、out the future application such as in the biomedical area. 2Investigation Scenario Based on Crime Data Mining Model 2.1General Assumptions There is only one criminal gang in the given network; Messages can not be classifi ed emotionally from appearance; Occurrences of messages are irrelevant to time

22、; Eff ects form data errors are negligible. 2 Team # 12218Page 3 of 18 2.2 Defi nitions Before proceeding to the real problem, we fi rst make some defi nitions: The multi-graph G = (V,E) is assumed to be a social network, where V denotes the set of nodes and E denotes the set of edges. A person in t

23、he network can be denoted by a node in the graph while edges stand for messages with certain topics passing through diff erent persons. Thus we can separate these persons into several cliques, judging from the topics of communication. Let G0= (V 0,E0) be a supergraph corresponding to multi-graph G,

24、where V 0 is the set of supernodes and E0is the set of superedges 5. Each su- pernode P0 i represents topic i with associated people in multi-graph G and there exists a superedge when supernodes connected by it possess common nodes. The weight of one superedge is the number of the common nodes. Figu

25、re 1: Multi-graphFigure 2: Supergraph Figure 1 and 2 presented above display the transition from multi-graph to the corresponding supergraph. Nodes in the multi-graph can be recognized as persons and supernodes in the supergraph can be recognized as topics involved with relevant people. So the crime

26、 busting in a criminal network can be translated into: “Pro- vided with a multi-graph G = (V,E) and a query list Q = q1,q2,q3,.,qi, we aim to fi nd out all the possible subsets of nodes associated with ones in Q and form a list ordered by the measurements of association.” 2.3Explanation of the Model

27、 The researched crime network is connected by messages delivery, through information transmitting, the whole conspiratorial network can function effi - ciently. An common staff member may have no idea in knowing whom the people really are, when having a conversation with them whereas the intrinsic c

28、haracter of topic can determine whether a person is involved in the conspiracy or not. A piece of message is consist of countless information in many aspects. It is impossible and unnecessary to deal with them all at the same time, let alone composing these messages to a conclusion. In this case, we

29、 only focus our attention on three main aspects: Topics, Direction and People to whom they talked. However, the biggest challenge that we have confronted with is how to combine these three kinds of information appropriately. 3 Team # 12218Page 4 of 18 We proposed a structure of network here in which

30、 the nodes denote topics of messages and edges denote people involved in them. Regarding the defi nition listed above, these characters are all possessed by the supergraph. In the general view, when a person switch his or her attention from one topic to another, we can regard it as person “fl ows” a

31、mong related topics, therefore the whole network become alive. Before the further exploration of this method, we should fi rst analyze the “similarity” among collected topics. We use “Cooperation Factor (CF)” 5 to study the latent interrelationships. Defi nition 2.3.1 (Cooperation Factor) In the sup

32、ergraph G0= (V 0,E0), the Cooperation Factor between two supernodes P0 i and P0 j is: CF(S0,V 0 i,V 0 j) = 2 Nc(P0 i,P 0 j) Nt(P0 i,P 0 j) ,CF 0,1;(1) Where: Ncdenotes the number of common nodes between P0 i and P0 j; Ntdenotes the number of total nodes contained in the P0 i and P0 j. It is obvious

33、that not all conspirators can contact directly but they must be very close for safety concern. To utilize this collective feature, it is essential to defi ne a Cooperation Distance (CD) 5: Defi nition 2.3.2 (Cooperation Distance Metric) In the supergraph G0= (V 0,E0) with a list of i person denoted

34、by a query Q = q1,q2,q3,.,qi. Let vbe a particular node in the multi-graph G, and V be the set of nodes in the multi-graph G. Then, the Cooperation Distance of v is then defi ned as: CD(S0,v,V ) = |Q| P i=1 ShortestPath(S0,v,qi) |Q| ;(2) Where: the ShortestPath(S0,v,qi) is 0 when both qiand vbelong

35、to a same supernode, otherwise is the value derived from shortest path algorithm of vand qiwith the weight of edges measured by CF. The value of CF evaluates the level of correlation between two diff erent topics, then 1CF indicates the level of uncorrelation in the opposite. We pick out the assumed

36、 conspiratorial topics and compare other unidentifi ed topics with it by accordingly assign diff erent weights. Hence, we can apply the shortest path algorithm to this theory and fi gure out the corresponding CDs of each unidentifi ed staff member. Those people with lowest values can be considered a

37、s having the most intimated communication with the criminal gang, which means they are more likely to be the real members in a criminal gang. 2.4 Tests and Verifi cations After computing all the data we can obtain from the EZ (an investigation scenario provided by supervisor) case, concerning the po

38、ints stated above, we can make a list of the CDs value in the ascending order, which indicates the possibility of a person being a conspirators in the opposite. 4 Team # 12218Page 5 of 18 NameCD metricRank Dave0.00001 George0.00001 Ellen0.00001 Carol0.33334 Fred0.33334 Harry0.33334 Bob0.71436 Inez0.

39、71436 Anne1.04769 Jaye1.428610 Table 1: Priority List for EZ Case The top three in the list are the staff members with the highest possibilities of being involved in the conspiracy derived from our analysis, where there are already two are sure to be guilty. Moreover, Anne along with Jaye are both i

40、dentifi ed to be the “farthest away” from the conspiracy. But it is sad to fi nd that Carol may still be misjudged as a conspirator and we may probably leave Bob behind as well. Figure 3: Discriminating Line for EZ Case In order to determine a discriminating line in separating innocents from conspir

41、ators, we calculate the diff erence among pairs of neighbors in the list and pick out the biggest one through comparison. The hugest gap measured by curvature between two staff members can be deemed as a boundary that distinguishes two groupsthe innocent one and the conspiratorial one for the conspi

42、ratorial messages is sure to be restricted in a small group. Although there still exists several demerits considering the results, the ex- cellent structure of the investigation scenario encourages us to apply this method 5 Team # 12218Page 6 of 18 to a larger criminal network with much more data. 3

43、Analysis on the Current Case For now, the new network has 83 people involved, 400 collected messages, over 21,000 words of messages traffi c, 15 topics, 7 known conspirators, and 8 known non-conspirators. There are also several members with same names in investigation which we just diff erentiate th

44、em by the denoted numbers. Based on the SNA method we proposed before, we calculate each staff members CD metric and arrange them in an ascending order list. Considering the list size, we only list the top 40 to perform the results of our analysis. 3.1Priority Lists of Potential Conspirators We fi r

45、st work out the CF values of each adjacent vertices in order to quantify the level of correlations among edges. Because there are three topics considered to be involved in the conspiracy, we make a combination of these three initial values into ones by setting equal weights and make an addition. The

46、refore CD metrics can be determined by making the use of shortest path algorithm and priority list may come to surface then. The output of our analysis on the 3 topics which deemed to be conspiratorial and 7 known conspirators are listed below: NameCD metricNameCD metric Elsie0.0000Paige0.2859 Jean0

47、.0000Neal0.2859 Alex0.0000Priscilla0.2859 Elsie0.0000Kristina0.3641 Paul0.0000Sherri0.3641 Harvey0.0000Franklin0.3641 Ulf0.0000Marcia0.3641 Cha0.0000Jerome0.4224 Sheng0.0000Louis0.4224 Darol0.0000Beth0.4420 Yao0.0000Douglas0.4420 Stephanie0.2265Crystal0.4480 Kim0.2265Claire0.4480 Beth0.2265Jerome0.4

48、480 Seeni0.2265Darlene0.4480 Dolores0.2340Patricia0.4525 Marion0.2340Jia0.4525 Dwight0.2340Neal0.4584 William0.2340Melia0.4584 Lars0.2340Francis0.4605 Table 2: Partial Priority List for Requirement 1 We note that the 11 persons on the top have the same values of CD metric, on the other hand, indicat

49、es that they can be regarded as having the same 6 Team # 12218Page 7 of 18 possibilities of being a conspirator. The 7 known conspirators being involved in suspected group also demonstrates the reasonableness of our method. Furthermore, we plot the discrimination line in the same way we displayed in

50、 the small network by comparing the separation among adjacent persons in the priority list. Thus, the possible conspirators are: Elsie, Jean, Alex, Elsie, Figure 4: Discriminating Line for Requirement 1 Paul, Harvey, Ulf, Cha, Sheng, Darol and Yao. 3.2Performance with More Accurate Information The p

51、romising results obtained above motivate us to test our models per- formance with more accurate information in the same network. When Chris was identifi ed to be a conspirator and Topic 1 was regarded as being associated with conspiracy, we revaluate the level of correlations among total 15 topics.

52、After setting diff erent similarities weights to particular edges accordingly, we recompute these data and work out with a amended list and discriminating line. The results are displayed in Table 3 and Figure 5. 4Searching for Leaders Its hard to uncover all the leaders in a well-organized crime net

53、works due to its high complexity. But we can build a calibration system based on a theory called Centrality Analysis 6,7 to identify some suspects who are more likely to be the chief ones. 7 Team # 12218Page 8 of 18 NameCD metricNameCD metric Chris0.0000Beth0.2877 Elsie0.0000Paige0.3362 Jean0.0000Ne

54、al0.3362 Alex0.0000Priscilla0.3362 Elsie0.0000Crystal0.3511 Paul0.0000Lois0.3511 Harvey0.0000Ellin0.3511 Ulf0.0000Han0.3511 Cha0.0000Kristina0.3525 Sheng0.0000Sherri0.3525 Darol0.0000Franklin0.3525 Yao0.0000Marcia0.3525 Dolores0.2869Gretchen0.3525 Marion0.2869Douglas0.3525 Dwight0.2869Claire0.4294 W

55、illiam0.2869Jerome0.4294 Lars0.2869Louis0.4294 Seeni0.2869Darlene0.4294 Stephanie0.2877Jerome0.4299 Kim0.2877Beth0.4549 Table 3: Partial Priority List for Requirement 2 Figure 5: Discriminating Line for Requirement 2 4.1Centrality Analysis There are three measures of centrality that have been widely

56、 used in social network analysis: Betweenness Betweenness of one node is the number of geodesics (shortest 8 Team # 12218Page 9 of 18 paths between any other two nodes in the network) passing through it: CB(i) = gjk(i) gjk (3) Where: gjkrepresents the number of shortest paths between any two nodes a

57、nd gjk(i) represents the number of shortest paths running through node i; Closeness Closeness is the sum of geodesics between the particular node and every other node in the network: CC(i) = ? N X j d(i,j) ?1 (4) Where: d(i,j) denotes the binary shortest distance between any two nodes in the network

58、; Degree The original concept 6 of one nodes degree is its number of links as: ki= CX D(i) = N X j xij(5) Where: i denotes the focal node, j denotes other nodes in the network; N is the nodes number and xij= 0,1 are the entries in the adjacency matrix. However, some pioneers proposed a method which

59、considering the weights of each edge in a weighted network 810 by introducing a new measure called strength: si= CW D(i) = N X j wij(6) Where: i denotes the focal node, j denotes other nodes in the network; N is the nodes number and wijare the entries in the adjacency matrix whose value is positive when i is connected to j (so that it can refl ect the weight), otherwis

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论