[毕业设计精品]医学中的数据挖掘问题 翻译资料_第1页
[毕业设计精品]医学中的数据挖掘问题 翻译资料_第2页
[毕业设计精品]医学中的数据挖掘问题 翻译资料_第3页
[毕业设计精品]医学中的数据挖掘问题 翻译资料_第4页
[毕业设计精品]医学中的数据挖掘问题 翻译资料_第5页
已阅读5页,还剩8页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、医学中的数据挖掘问题摘要在基于调查的病人数据上的任何回顾的原则是在没有姓名的情况下通过问题或标志来搜索患者。有了适当的帮手,译码电脑通过问题将数据基础存档,数据挖掘的进程将变得容易。人们只需要输入要求,在很短的时间就可得到适当的数据。 医疗档案时常只有基础在纸上记录,借由一个病人名字当做进入钥匙。为了在这些档案中找那适当的记录, 侦探战略是必要的。这一进程仍在继续收集通常大量的论文,发现它们之间的适当的记录和最后的译码,并安排他们在一个表中。整个命名过程可以在病人,文件和数据挖掘上分开。因为他们的拖延,这些阶段是医疗数据调查的大部分时间松动的组成部分。作者介绍了他的数据挖掘经验。关键词:数据挖

2、掘,医药,冠状动脉疾病,数据库1、导言对病人的医疗数据调查的任何回顾有四个主要阶段:计划研究,数据挖掘,数据处理和结果的解释。对于调查的一个善行成就,其中每一个阶段是同样重要的。数据挖掘过程开始确定在哪个池中找到足够数量选择标准履行预计的病人,继续确定计划的这类病人,收集他们的记录,核实相关的每一个病人和他的纪录,抓住适当的数据,定性和定量译码数据并在表中安排这些数据。现在这些数据就已经为处理作好准备。如果在一个计算机化的医疗数据存在合适的病人,其主要步骤通过这一过程很容易。根据数据的译码水平,只需要把对病人的诊断,年龄,性别或完成观察调查结果的请求输入电脑,就可收到一连串粗糙适当的数据。 如

3、果我们只开设了文件归档,数据挖掘过程将会更加困难.2、医学资料库一般情况下,任何病人的医疗过程包括诊断和治疗。这个过程都在一个办公室,在医院或在诊断或政策干预设施的病床边。大多数的诊断结果是图像。所有在网上的言论,大多数的诊断,治疗和最终结果做定性描述。明显的结果报告是对那些作简要说明的问题,调查结果,政策干预措施和进一步的建议的评论文章。因为流行病学家的需要,最终诊断被改为译成密码。为了如何将电脑中的所有这些数据建立一个资料库可能有几种可能性。所有描述记录,计算结果,数字图象记录或其说明应被收集在有病人的名字或诊断的计算机数据库用来作为开始钥匙。一种先进的方法是将开始用任何钥匙实现数据的可能

4、性的任何调查或政策干预和任何结果译成密码。3、医学数据网络系统诊断或治疗过程中的任何先进情形涉及若干部门的诊断和政策干预,有时甚至更多的机构。在任何地方所有已收集到的医学数据是必要的或至少是有用的。在医疗机构之间有了适当的数据库和网络系统,一切适当的数据或图像很容易实现。在教学的情况下,也是一个医学教育的重要组成部分。并行的医疗网络系统是医疗保险网络。所有公立医疗机构都被连接到计算机数据库网络,这些在任何特定时刻能够提供一系列有关医疗保险数据。4、从理论到现实上面的医疗数据库和网络系统说明在描述理论;现实是不同的。因为它已经被提到,所有这些特别的病人的诊断和治疗过程在同一医院或不同的医院中的不

5、同部门出现。任何记录的体检或政策干预的最初的记录被存放在一个部门的存档部。在所有部门中每个标识了的部门有自己的病人名单,和相类似的原则。这些证据包括一个有病人的名字的纸卡,卡是以字母顺序排序,每个卡有一个具体的数字,并且这个数字在病人记录的存档中被提及。检索特别记录的关键是病人的名片上的号码。这位患者的名单上除了病人的名字,与医院的一般清单或保险清单没有任何共同之处。这一定律可能来源于是从埃及思路和,但在临床工作中,与已知病人的名字,它显得完美。唯一的问题是时间。如果要取得一个适当的档案记录可能需要几个小时或几天。(这一制度的好的一面的是,若干潜在的失业员工有工作!)。5、存在的问题 所描述的

6、,我们认为目前主要仍然是足够的为临床工作,但完全不适用于研究目的档案系统。因为它已经提到,病人的名字是唯一的关键检索,但在研究中若没有病人的姓名任何迹象都可以作为检索的关键。为了说明这个问题,我们研究组的数据挖掘经验将会描述。在1996至1997年,在斯洛文尼亚的一个大型的大学医院,我们已进行了回顾性对调查学习对冠状动脉疾病的诊断过程的机器影响的研究。为了研究记录的需要,我们已收集几百名经历那确定或排除冠状动脉疾病 (cad) 的持续诊断过程。后者过程包括一系列的测试(一种模式),例如 :历史/临床检查,心电图运动试验,负荷心肌灌注显像和冠状动脉造影。个别病人的诊断问题决定上述程序的数量和序列

7、。所有这些程序在医院的不同的,专门部门中进行。每年这些部门中的任意一个都要执行专门测试从几百到一千多次。大多数患者接受一个或两个程序; 整个调查顺序只在少数人中发生。为了我们研究的需求,我们需要确定这个小团体。因此我们处理作为进入钥匙的没有名字的病人的问题, 和必须用名字寻找没有问题的档案。我们开始了一项重大任务-挖掘病人的数据。其实,以我们的经验,在医学的挖掘过程可分为三类:病人挖掘,档案挖掘和数据挖掘。现在还很难说,哪一个是最艰苦的。51、病人挖掘过程 首先,我们不得不构造病人检测策略。由于在命名调查之间的显像是最少最常见被执行,我们在这一领域开始。核医学正作了一些调查,心肌显像是其中之一

8、。在该部门的病人名单调查的类型中没有显示。相反,对于任何特定的调查,笔记本上手写记录着被保存的接受按日期分类测试的病人姓名。每年在这段时间大约500个心肌显像被执行。为了找到与其他观察调查的巧合,我们建立了一个清单。我们在计算机中输入所有的姓名(手写的原始记录!)并将他们按字母顺序排序。幸运的是,一直从事冠状动脉的部门(除了类似记录)保存着一份手写病人名单的文件那文件拥有按字母顺序排列数量为1200个姓名(以同一信件开头的所有名称都在同一室被记录,没有任何进一步的排序)。不幸的是,以k开头的在一组没有作任何进一步整理的70个姓名没有多大帮助。然后,我们视觉上比较两份名单,并发现了大约120名相

9、匹配的名字。1000例病人心电图运动的名单的排序原则是与显像相同,只按日期排序。一组匹配显像与运动心电图病人的共同特征是日期,两者调查都在同一天被执行。因此,对于这些120例病人,我们不得不回到已命名的手写笔记,为每个病人找到执行显像和检查手写心电图病历匹配的日期。最后,我们收集了在某一特定年份100个相匹配的名字。当我们为我们的研究需要大约400名患者时,这经历重复4次。52、收集的档案记录-论文挖掘调查记录被保存在按顺序编号三个特定档案,不同的每个存档。为了收集记录,我们必须建立三个档案记录数量清单。为了显像,我们不得不找到这个在病人字母卡的数据。这心电图数据被写在提到过书上的名字的旁边。

10、为了执行冠状动脉造影,所有患者住院治疗和血管造影的记录被写在最后报告中。这份报告的关键是按住院日期存档记录手写的患者清单数据。然后,我们发现在冠状动脉造影名单上实际住院的日期(大多数患者住院几次)。与三个命名存档数据名单,然后,我们搜索到的三种不同的档案来收集记录。我们可以得到大约90 的病人的完整记录,其余的是由于各种原因无法得到。为400名患者收集的最终卷文集数量约二点五立方米。根据研究协议,我们必须为每个病人收集大约70个参数,从上述记录来看都在一起约为28 000数据。53、数据采集问题收集数据,我们遇到了三个主要问题:1、冠心病是一种慢性疾病。我们中大量的病人,在过去有过部分或全部观

11、察调查,有时一部分或全部有过几次观察。在政策干预或公共卫生事件他们的地位会改变。根据议定书,所有的观测调查来描绘同样的健康状况。所以,有时我们不得不比较文件体积10厘米的高度,以找到适当的结果。2 、显像是一种功能成像试验,结果带有图象。至于我们需要说明的档案文件是主要不够的,我们必须重建大部分存档数字记录(在列表327例!) 。3 、有时对定量的转变来说,定性结果描述是不够准确,因此,当译码时我们须猜测它的含义和临时做。6、2002年的局势 在1998年,我们已经在一份文件 1 中介绍了上述情况的。在过去几年里,我们大体上通告更好地了解这个问题,协议需要有一个正确的数据网络系统和计算机档案。

12、并且尝试:在内部网络系统和一个平行的在光盘记录的数字档案中提到的一个部门。但是,在唯一的入口关键提到存档数据的标准是病人的名字。也有一些连接医院各部门的诊断的讨论。实际上,在2002年,在提到的每个部门唯一真正的新事物是今年一个新的为病人服务的笔记本电脑!我们担心,在遥远的未来员工在档案间慢慢行走将成为现实。7、医疗档案的极限 临床医学建立数字档案有其局限性。任何诊断和治疗过程中,很多文件图像记录, x射线记录,床头备注等聚集一堂。完美存档将所有材料以数字化的方式扫描和记录。现在的问题是除了目前的数据,基于终身医疗记录的临床工作。病人的状态有时很难描述,甚至比译码描述更难。因此,我们认为,假使

13、建立数字档案,在遥远的未来档案都将共存。建立一个先进的档案,获得数据的可能问题的关键,至少译码主要病人的文件是必要的。从理论上说,这很容易-作为试验结果的一部分,那些负责进行调查的人最好在一台计算机填补合适的表格。唯一的问题是,这将导致大量的额外工作和额外费用。8、结论1、在任何医院,病人的数据档案或连接这些档案的其他医疗机构和网络系统是毫无疑问的时间问题。只要有可能,我们必须坚持先进的归档系统。 2、挖掘工的工作是被称为沉重的。有时候,数据挖掘工的工作会更重。data mining problems in medicinec groseljuniversity medical center,

14、 nuclear medicine department, ljubljana, sloveniaciril.groseljkclj.siabstractthe principle of any retrospective on patient data based investigation is searching the patients by problem or sign, but no name. with a proper, by problem encoded computer archived data base, the data mining process would

15、be easy. one would need only input the request and get the proper data in short time.the medical archives frequently base on paper records only, with a patient name as entering key. to find the proper record in such archive, a detective strategy is needed. the process continues with collecting the u

16、sually enormous amount of papers, finding between them the appropriate records and finally encoding and arranging them in a table. the whole named process can be separated on patients, paper and data mining. because of their dilatory, these phases can be the most time loosing part of an on-medical d

17、ata based investigation. author describes his data mining experience.key words: data mining, medicine, coronary artery disease, data bank1. introductionany retrospective, on patients medical data based investigation has four main phases: plan of the study, data mining, processing of the data and int

18、erpretation of the results. for a good accomplishment of an investigation, each of these phases is equally important. the data mining process starts with defining the pool in which finding of sufficient number of patients fulfilling selected criteria is expected, continues with identifying a planned

19、 number of such patients, collecting their records, verifying the relevance of each patient and his record, catching the proper data, encoding the data - the qualitative and the quantitative and arranging the data in table. the data are now ready for processing.in case a computerized medical data ba

20、nk of suitable patients exists, the main steps of this process pass easy. one only needs to put the requests regarding the patients diagnosis, age, gender or accomplished observed investigations results into the computer and receives a list of rough suitable data, depending of data encoding level.if

21、 we operate only with a paper archive, the data mining process can be more difficult2. the medical data bankin general, any patients medical process consists of diagnosing and treatment. the process goes on in an office, at bedside in hospital or in diagnostic or intervention facilities. the majorit

22、y of diagnostic results are images. all on-line remarks, the majority of diagnostic, therapeutic and final results are described qualitatively. the obvious final report is a review where the problem, findings, interventions and further suggestions are briefly described. for the needs of epidemiologi

23、sts the final diagnosis is encoded.there are probably few possibilities how to computerize all these data in the purpose of creating a data bank. all descriptive records, numerical results, digital image records or their descriptions should be collected in computer database with patients name or dia

24、gnosis as an opening key.an advanced approach is to encode any investigation or intervention and any result, opening the possibility of reaching the data by any key.3. the medical data network systemany advanced case of diagnostic or therapeutic process involves a number of diagnostic and interventi

25、on departments, sometimes even more institutions. at any such point all already collected medical data are necessary or at least useful. with proper database and network system between medical institutions, all appropriate data or images are easily reachable. also as teaching cases, an important par

26、t of medical education.parallel to medical network system is the medical insurance network. all public medical institutions are connected to the computer databank network, which is in any given moment capable to serve series of information regarding medical insurance.4. from theory to realitythe upp

27、er description of the medical databank and the network system depicts the theory; reality can be different.as it was mentioned already, the whole particular patients diagnostic and therapeutic process occurs in different departments of the hospital or in different hospitals. the original record of a

28、ny recorded medical examination or intervention is kept in an archive of the department where the process was performed. each of the named departments has its own list of patients, with similar principle in all departments. the evidence consists of a paper card with patients name; cards are sorted a

29、lphabetically, each card has a specific number and this number is carried also by the patients record in the archive. the key for retrieve particular record is the number on the patients name card. this patients list has, beside patients name, nothing in common with the hospitals general list or ins

30、urance list.this principle originates probably from egyptian sinuhe, but for clinical work, with known patients name, it works perfectly. the only problem is time. to obtain a suitable archive record could take a few hours or days. (the good side of this system is that certain number of potentially

31、unemployed employees have a job!).5. the problemthe described, in our opinion mostly still current archive system is sufficient for clinical work, but completely unsuitable for research purposes. as it was mentioned already, the patients name is the only key for retrieval, but in research any sign c

32、an be the retrieving key, but no name. to illustrate this problem, the data mining experience of our research group will be described.in years 1996-97, in a big university hospital in slovenia, we had performed a retrospective research investigating the impact of machine learning on coronary artery

33、disease diagnostic process. for the need of study protocol, we had to collect medical records of a few hundred patients who underwent the diagnostic process for confirming or excluding the persistence of coronary artery disease (cad). the latter process consists of series of examinations as (one mod

34、ality): history/clinical examination, exercise ecg testing, stress myocardial perfusion scintigraphy and coronary angiography. the diagnostic problem of an individual patient determinates the number and sequence of above mentioned procedures. all the procedures were performed in different, specializ

35、ed departments of the hospital. each of these departments perform from a few hundred to more than thousand particular examinations yearly. the majority of patients undergo one or two procedures; only in a minority the whole investigations sequence was performed. for the demand of our research we nee

36、ded to identify this small group. so we dealt with the patients problem but no name, and had to search for archives with name but no problem as entering key. we started a big task mining the patients.actually, by our experience, the mining process in medicine can be separated in three categories: pa

37、tient mining, papers mining and data mining. it is hard to say, which one is more strenuous.5. 1. the patient mining processfirst we had to construct the patient detection strategy. because scintigraphy between the named investigations was least frequently performed, we started in that area. nuclear

38、 medicine is performing a number of investigations, myocardial scintigraphy is one of them. on the departments patient list the type of investigation is not shown. instead, for any particular investigation a notebook record with handwritten names of the patients undergoing the test sorted by date is

39、 kept. in that time approximately 500 myocardial scintigrafies per year were performed. to find the coincidence with other observed investigations, we created a list. we typed all the names (handwritten on original record!) in the computer and sorted them alphabetically.luckily, the department perfo

40、rming coronary angiography kept (beside the similar recording) a paper volume of a handwritten patient list with round 1200 names sorted in a semi-alphabetical order (all the names beginning with same letter were written in same compartment, without any further sorting). unfortunately, 70 names begi

41、nning with k in a group without any further sorting does not help much. then, we visually compared both lists and found approximately 120 matching names.the sorting principle of the list of 1000 patients undergoing exercise ecg was the same as for scintigraphy by date only. the common feature of a g

42、roup of patients with matching scintigraphy and exercise ecg was the date - both investigations were performed on the same day. so, for these 120 patients, we had to return to already named handwritten notebook, find for each patient the date of scintigraphy being performed and check the handwritten

43、 exercise ecg patients book for a matching date. finally we collected 100 matching names in a particular year. as we for our study needed approximately 400 patients, the story repeated 4 times.5. 2. collecting the archive records papers miningthe investigation records were keep in three particular a

44、rchives, sorted by sequent number, different in each archive. in order to collect the records, we had to create three archive records number lists. for scintigraphy, we had to find this number in a patient alphabetical cards. the number for exercise ecg was written beside the name in the mentioned b

45、ook. for performing coronary angiography, all patients were hospitalized and the angiography record was written in the final report. key for this report was the archive record number on the handwritten patients list, sorted by hospitalization date. then we found the date of actual hospitalization (t

46、he majority of the patients were hospitalized a few times) in coronary angiography list. with the three named archive numbers lists, we then searched in the three different archives to collect the records.we could obtain the complete records for approximately 90 % of patients, the rest was for diffe

47、rent reasons not reachable. the final volume quantity of collected papers for 400 patients was roughly 2.5 m3.according to the protocol of the study we had to collect around 70 parameters for each patient, all together around 28 000 data from the above mentioned records.5. 3. data collecting problem

48、scollecting the data, we had encountered three main problems:1. cad is a chronic disease. a great number of patients of our group had some or all observed investigations performed already in the past, sometimes some or all of them for few times. during the interventions or health incidents their sta

49、tus changed. according to protocol, all the observed investigations had to depict the same health situation. so, sometimes we had to compare the paper volume 10 cm in high to find the proper results.2. scintigraphy is a functional-imaging test, with an image as result. as for our needs the archive p

50、aper description was mainly insufficient, we had to reconstruct and revue the majority of the archived digital records (for at list 327 patients!).3. sometimes the description of qualitative results was not precise enough for transformation to quantitative ones, so while encoding we had to guess its

51、 meaning and improvise.6. the situation in 2002in year 1998 we already presented the situation as described above in a paper . in last years we in general notice a better understanding of the problem, an agreement for need for a proper data network system and computerized archive. and a try: the internal network system and a parallel, on discs records based digital archive in one of the mentioned departments. but the only entrance k

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论