




已阅读5页,还剩2页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ZHANG Xin, LI Kun-lunDesign of Network Behavior Analysis System Based on Association Rules MiningCollege of Science and Technology, Nanchang UniversityDesign of Network Behavior Analysis System Based on Association Rules MiningWith the rapid development of the computer network technology, Internet application further popularization, people in the work and the life rely more and more on various network applications and services. At the same time, some network using issues also will appear, is mainly shown in two aspects: one is excessive network entertainment behavior, the application USES of a lot of network resources and affect the normal network services; On the other hand is network users of harmful behavior, these actions often brings serious the problem of network security. Especially in the campus network, the user groups are mainly students, they are more likely to indulge in Internet chat, online games and other entertainment behavior, and often due to inappropriate network behavior and cause safety issues, or because of interest and curiosity and actively carry out some harmful network behavior. To solve these problems, the study and analysis of the users network behavior, especially students groups of Internet behavior, limit harmful network behavior occurs, for effective management of campus network resources, strengthen the campus network security have realistic significance.Based on the use of association rules mining technology, on the basis of analysis of the user access to the network through behavior to get the user behavior patterns, judge behavior orientation, abnormal behavior of the network behavior analysis system for research and design.1 network behavior analysis in the paperBehavior analysis is originally in the field of psychology research a concept, because it to all sorts of social activities have very good guiding value, so the application in many fields. Along with the computer network appear and popularization, domestic and foreign scholars began to study the characteristics of network users behavior and law. Network users have different interests and behavior habits, and in the Internet corresponding network access behavior also inevitable with the features of their own, to this kind of the analysis of characteristics of the network behavior analysis and exploration is the focus of research, the research methods mainly through the user access to the server log analysis, mining users in the international network access the behavior of features. Network behavior analysis mainly used in Internet use preference, network security audit, intrusion detection analysis, can realize optimized website design, security network security, leading and managing the abnormal behavior of network effect.Most of the current network behavior analysis research mainly are based on data mining technology, network behavior analysis process in fact is a network behavior from a large number of data in get valuable information data mining process, but the network behavior analysis method analysis emphasis and analysis object is different, mainly including: user features analysis, correlation analysis, classification and predicting, abnormal analysis, TopN analysis, IP address analysis, hits analysis, Web log analysis, etc. In the network behavior analysis system, the paper association rule mining algorithm for the correlation analysis, mining produce behavior model added to the behavior patterns in the library, and based on similarity methods to than behavior model and identify anomalies behavior.2 network behavior analysis system designThe system is mainly used in the campus network LAN environment, through the client Agent acquisition all users of network behavior of raw data, and the original data format for user behavior description format, sent to the server, form for data mining training data set, then the algorithm for mining association rules of behavior model the relationship between the characteristics are analyzed, extract the user behavior model information, establishing user behavior model library. For a particular user group, can collect real-time their network behavior data, format as a test data set and after user behavior patterns of behavior pattern of library than normal, if discovery is unusual behavior can be to a particular users computer operation management. The system includes three modules: data acquisition and formatting module, behavior pattern formation module and behavior model than module.2.1 data acquisition and format1) the source of the data are generally classified into log data or network data two kinds. As the network node is more, existing in the campus network all kinds of log data sources, and access to external site log information produced by the local managers cannot get, so the system in the client installed Agent program, by the Agent to collect data network behavior capability to send the server way data collection. This way can directly, complete for the users network behavior data, effective acquisition a lot of training data, and generally make unified management of campus network, can ensure client Agent program in normal operation. The client Agent in the client service as a system background, and the program includes three main thread: network data capture, thread, network data queue maintenance thread, network data sending thread.2) to get users network behavior model, have to the description of the network behavior effectively, which is what characteristic attributes and attributes of the relationship between to describe the users network access behavior. System USES a quad T, W, S, F format on the users network behaviors in the description, that is, the user use the network Time (Time), visit Website (Website), the use of network services (Service) and produce Flow (Flow). The client network data collection after the formatting, and send to the server form training data set.2.2 algorithm for mining association rulesAssociation Rules (Association Rules) can from database of transactions large amounts of data collection project between found interesting correlation or related relationship, to help people in all kinds of decisions. Association rules mining problems can be divided into two parts: finding all frequent itemsets, produced by frequent itemsets association rules, of which the first son problems to all the performance of frequent itemsets decided the association rules mining the overall performance. Apriori algorithm is a kind of classical algorithm for mining association rules, it USES iteration method mining frequent itemsets, process can be divided into two steps: connect the step and step pruning. In the connection step, through the connecting two frequent (k-1)-to generate candidate itemsets k-item set; In step pruning, delete the candidate k-options set (k-1)-not a subset of frequent itemsets candidate k-a set. Then scanning the database calculation candidate k-the support of a set number. Repeat the process until no new candidate itemsets produce.The system USES a vector calculation based on the improvement of the AprioriBV Apriori algorithm, the algorithm, through the affairs vector and itemsets vector of inner product between operation and operation addition can reduce candidate itemsets quantity, improve operation efficiency pruning, and a quick calculation candidate itemsets support number. The algorithm firstly find all frequent itemsets, then produced by frequent association rule. The first step to find all the frequent itemsets is the core issue of mining association rules, AprioriBV algorithm produces all the process of frequent itemsets for:1) generation 1-frequent itemsets. Scanning a database D, each will affairs T expressed as a affairs vector TVr, (r = 1, 2,., n), at the same time, record every item ij, (j = 1, 2,., m), in all affairs in the frequency of, namely itemsets ij support several sup (ij), for a given number of minimum support threshold MinSup, if sup (ij) MinSup, the ij L1, thus generating 1-frequent itemsets.2) generation 2-frequent itemsets. 1-frequent itemsets L1 and their connection to get C2. Each of the C2 itemsets IP, IQ expressed as a 2-itemsets IV2ipiq vector, the set for the support of sup (IP, IQ) =, among them for vector inner product operations. If sup (IP, IQ) MinSup, the IP, IQ L2, thus generating frequent itemsets 2-3) by frequent k-a collection of generating frequent k-itemsets. According to the definition of the sort algorithm of the rules Lk-1 of arbitrary a k-a collection of IP,. , IQ, need to connect a greater than IQ of item can be combined into one k-itemsets IP,. , IQ, ij (j p).Then, scanning a Lk-1, generate all the itemsets accumulate vector SVk-1 ij (j = k, k + 1,., m), for any k-the Ck itemsets last a ij, there will be j k. Then will Lk-1 of each one frequent k-a collection of IP,. , IQ the corresponding IVk-1 IP,. , IQ, respectively with the SVk-1 ij (j = q + 1, q + 2,., m) add operation, to work out the S = IVk-1 IP,. , IQ + SVk-1 ij.For the final Ck every a k-itemsets ip1, ip2,. , ipk calculations support number, sup (ip1, ip2,., ipk) =, if sup (ip1, ip2,., ipk) Min sup, the ip1, ip2,. , ipk Lk, thus generating frequent k-itemsets.Repeat process three), until Ck or Lk for empty set, finally get all the set of frequent itemsets.2.3 network behavior modelingThe system mainly USES the dynamic modeling method build user behavior model library. In the early use, the client Agent program collect user in a period of time inside of normal visit behavior, after format for centralized storage, forming a certain scale training data set, using AprioriBV algorithm of training data set travel association rules mining, extract the users normal behavior model, build user behavior model library. In the operation of the system stage, the administrator can control and collect the new normal visit behavior to join training data focused, so the system needs regular training data set for the behavior of the data mining again, and update the user behavior model library.Network behavior modeling process is described in the format of network behavior between four attributes of association rule extracting process. AprioriBV algorithm is based on the type of Boolean data mining algorithm, the network behavior description format attributes in all belong to more value attributes, need to change it into a Boolean type data. For example for flow properties, value including: Small, Middle and Big, Huge, a network behavior produce flow for Big, the corresponding Boolean type data for 0,0,1,0. Based on this can excavate is easier to understand, more efficient association rules.2.4 than behavior patternsBehavior model is established in than user behavior mode library, and on the basis of the collected users will present behavior, and behavioral patterns of behavior than mode library, analyzes the current behavior model and normal behavior model difference degree, so as to determine whether the current behavioral anomalies.Network access Time (Time) as the basic attributes of comparison, by the users in the current behavior model, behavior pattern in library search for all Time attribute and the current behavior the same user normal behavior model, the calculation of the current behavior and inquires the result set similar degree, the higher the degree of similar that the user behavior patterns and the normal behavior model the match and is the possibility of abnormal behavior and smaller.The client application of behavior support Agent than the response of the model results, when abnormal behavior, Agent according to behavior patterns of abnormal behavior than to send module level, can take: message box hint, temporarily locked the mouse, keyboard to suspend the network application and other measures, to the users network behavior guide and management.3 last wordThis paper introduces the network behavior analysis and association rules mining of basically, study the mining association rules based on the network behavior analysis technology, and the design of system model. The system make full use of the data mining technology in the large scale data on the advantage of that knowledge, mining refining the user behavior model, for network behavior monitoring and management to provide the basis. The active monitoring system for abnormal behavior and the management functions for the network resource management and network security maintenance provided new effective tools.翻译:基于关联规则挖掘的网络行为分析系统设计随着计算机网络技术的快速发展,互联网应用不断深入普及,人们在工作和生活中越来越依赖于各种网络应用和服务。与此同时,一些网络使用中的问题也随之显现,主要表现在两个方面:一方面是过度的网络娱乐行为,这些应用消耗了大量的网络资源,影响了正常网络业务的开展;另一方面是网络用户的有害行为,这些行为往往带来严重的网络安全问题。特别是在校园网,用户群体主要是学生,他们更容易沉迷于网络聊天、网络游戏等娱乐行为,而且常常由于不适当的网络行为而引发安全问题,或由于兴趣和好奇心而主动进行一些有害的网络行为。针对这些问题,研究和分析用户的网络行为,特别是学生群体的上网行为,限制有害网络行为的发生,对于有效管理校园网资源,增强校园网安全性都具有现实的意义。本文在利用关联规则挖掘技术的基础上,对通过分析用户访问网络的行为来获得用户行为模式、判断行为倾向、发现异常行为的网络行为分析系统进行研究和设计。1 网络行为分析概述行为分析原本是心理学研究领域中的一个概念,由于它对各种现实社会活动具有很好的指导价值,因此应用于很多领域。随着计算机网络的出现和普及,国内外学者也开始研究网络用户行为的特征及规律。网络用户都具有不同的兴趣爱好和行为习惯,在上网时相应的网络访问行为也必然带有各自的特征,对这种特征的分析与探索是网络行为分析研究的重点,研究的方式主要是通过对服务器端的用户访问日志进行分析,挖掘用户在对外进行网络访问时的行为特征。网络行为分析主要应用于网络使用偏好、网络安全审计、入侵检测分析等方面,可以实现优化网站设计、保障网络安全、引导和管理异常网络行为等作用。目前的大多数网络行为分析研究主要都基于数据挖掘技术,网络行为分析的过程实际上就是一个从大量网络行为数据中获取有价值信息的数据挖掘过程,但是网络行为分析方法的分析重点和分析对象却各有不同,主要包括:用户特征分析、关联分析、分类与预测、异常分析、TopN分析、IP地址分析、点击率分析、Web日志分析等。在网络行为分析系统中,本文采用关联规则挖掘算法来进行关联分析,挖掘产生行为模式添加到行为模式库中,并采用基于相似度的方法来比对行为模式和识别异常行为。2 网络行为分析系统设计系统主要应用于校园网的局域网环境下,通过客户端Agent采集所有用户的网络行为原始数据,并将原始数据格式化为用户行为描述格式,发送到服务器端,形成适合进行数据挖掘的训练数据集,然后采用关联规则挖掘算法对行为模式之间的关联特征进行分析,提取出用户行为模式信息,建立用户行为模式库。对于特定的用户群,可以实时采集他们的网络行为数据,格式化处理后作为测试数据集与用户行为模式库中的正常行为模式进行比对,如发现异常行为则可对特定用户的上机操作进行管理。系统主要包括三个模块:数据采集与格式化模块、行为模式生成模块和行为模式比对模块。2.1 数据采集与格式化1)数据的来源一般分为日志数据或网络数据两种。由于网络节点较多,校园网中存在各种各样的日志数据源,同时访问外部站点所产生的日志信息本地管理者无法得到,因此系统采用在客户端安装Agent程序,由Agent收集网络行为数据并发送到服务器端的方式采集数据。这种方式能够直接、完整的获取用户的网络行为数据,有效的采集大量训练数据,而且校园网络普遍进行统一管理,可以保证客户端Agent程序的正常运行。客户端Agent作为系统服务在客户端后台运行,程序包括三个主要线程:网络数据捕获线程、网络数据队列维护线程、网络数据发送线程。2)要获得用户网络行为模式,必须对网络行为进行有效的描述,即用哪些特征属性及属性间的关系来描述用户的网络访问行为。系统采用四元组T,W,S,F的格式对用户的网络行为进行描述,即用户使用网络时间(Time)、访问的网站(Website)、使用的网络服务(Service)和产生的流量(Flow)。客户端网络数据采集后进行格式化,然后发送到服务器端形成训练数据集。2.2 关联规则挖掘算法关联规则(Association Rules)可以从事务数据库中大量数据的项目集合之间发现有趣的关联或相关关系,以帮助人们进行各种决策。关联规则挖掘问题可以分为两个子问题:找出所有频繁项集、由频繁项集产生关联规则,其中第一子问题找出所有频繁项集的性能决定了关联规则挖掘的整体性能。Apriori算法是一种经典的关联规则挖掘算法,它采用迭代法挖掘频繁项集,过程可分为两步:连接步和剪枝步。在连接步,通过连接两个频繁(k-1)-项集产生候选k-项集合;在剪枝步,删除候选k-选项集合中(k-1)-项子集不是频繁项集的候选k-项集合。然后扫描数据库计算候选k-项集的支持数。重复这个过程,直到没有新的候选项集产生。系统采用一种基于向量计算的改进Apriori算法AprioriBV算法,该算法通过事务向量和项集向量之间的内积运算与加法运算可减少候选项集数量、提高剪枝运算效率,并快速计算候选项集支持数。该算法首先找出所有频繁项集,然后由频繁产生关联规则。第一步
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年日光温室外保温被项目申请报告范文
- 2025年加气柱项目提案报告
- 法语1马晓宏课件
- 绿色供应链管理在航空航天制造业的应用与推广报告2025
- 2025年光伏电站土地流转与农村土地流转市场培育报告
- 新能源行业人才流动现状分析:2025年竞争格局与挑战报告
- 聚焦2025:工业互联网平台同态加密技术在娱乐行业的应用与可行性研究报告
- 2025呼伦贝尔发展和改革委员会所属事业单位引进人才5人考前自测高频考点模拟试题及答案详解一套
- 2025年新能源行业大数据分析报告:技术创新与市场拓展
- 2025年新能源汽车无线充电技术在户外照明产品中的应用报告
- 中国电动摩托车行业发展趋势及发展前景研究报告2025-2028版
- 教育学原理项贤明题库
- 隧道机电考试试题及答案
- 工字国旗安装采购合同协议
- 绳索在消防领域的技术革新-全面剖析
- 反三违奖惩考核办法
- 110kV变电站及110kV输电线路运维投标技术方案
- 医学思政教育案例
- 统计诚信培训课件
- 大学语文知到智慧树章节测试课后答案2024年秋南昌大学
- DB11-T 344-2024 陶瓷砖胶粘剂施工技术规程
评论
0/150
提交评论