UCI大数据库使用说明书_第1页
UCI大数据库使用说明书_第2页
UCI大数据库使用说明书_第3页
免费预览已结束,剩余5页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、UCI数据库使用说明机器学习领域的UCI数据集使用说明此目录包含数据集和相关领域知识(后面以简短的列表形式进行的注释),这些数据已经或能用丁评价学习算法。每个数据文件(*.data)包含以“届性-值”对形式描述的很多个体样本的记录。对应的*.info文件包含的大量的文档资料。(有些文件_generate_databases;他们不包含*.data文件。)作为数据集和领域知识的补充,在utilities目录里包含了一些在使用这一数据集时的有用资料。地址/mlearn/MLRepository.html,这里的UCI数据集可以看作是通过web的远程拷贝。作

2、为选择,这些数据同样可以通过ftp获得,.可是使用匿名登陆ftp。可以在pub/machine-learning-databases目录中找至U。汪息:UCI一直都在寻找可加入的新数据,这些数据将被写入incoming子目录中:希望您能贡献您的数据,并提供相应的文档。谢谢一一贡献过程可以参考DOC-REQUIREMENTS文件。目前,多数数据使用下面的格式:一个实例一行,没有空格,届性值之间使用逗号“,”隔开,并且缺少的值使用问号“?”表示。并请在做出您的贡献后提醒一下站点管理员:下面以UCI中IRIS为例

3、介绍一下数据集:ucidatairis中有三个文件:Isindex为文件夹目录,歹0出了本文件夹里的所有文件,如iris中index的内容如下:Indexofiris18Mar1996105Index08Mar19934551iris.data30Msiris.data为iris数据文件,内容如下:5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa7.0,3.2,4.7,1.4,Iris-versico

4、lor6.4,3.2,4.5,1.5,Iris-versicolor6.9,3.1,4.9,1.5,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica7.1,3.0,5.9,2.1,Iris-virginica如上,届性直接以逗号隔开,中间没有空格(5.1,3.5,1.4,0.2,),最后一列为本行届性对应的值,即决策届性Is介绍了irir数据的一些相关信息,如数据标题、数据来源、以前使用情况、最近信息、实例数目、实例的届性等,如下所示部分:7.Attrib

5、uteInformation:1. sepallengthincm2. sepalwidthincm3. petallengthincm4. petalwidthincm5. class:-IrisSetosa-IrisVersicolour9. -IrisVirginicaClassDistribution:33.3%foreachof3classes.本数据的使用实例请参考其他论文,或本站后面的内容。ThisistheUCIRepositoryOfMachineLearningDatabasesandDomainTheoriesThisistheUCIRepositoryOfMachine

6、LearningDatabasesandDomainTheories4D:pub/machine-learning-databases/mlearn/MLRepository.htmlLibrarian:PatrickM.Murphy()111databasesanddomaintheories(36MB)Thisdirectorycontainsdatasetsanddomaintheories(thelatterhavebeenannotatedas

7、suchinthefollowingbrieflisting)thathavebeenorcanbeusedtoevaluatelearningalgorithms.Eachdatafile(*.data)containsindividualrecordsdescribedintermsofattribute-valuepairs.Thecorresponding*.infofilecontainsvoluminousdocumentation.(Somefiles_generate_databases;theydonothave*.datafiles.)Inadditiontodataset

8、sanddomaintheories,theutilities/directorycontainsutilitiesthatyoumayfindusefulwhenusingdatasetsinthisrepository.Thecontentsofthisrepositorycanbeviewedandremotelycopiedovertheweb.Theaddressis/mlearn/MLRepository.html.Alternatively,thecontentsofthisrepositorycanberemotelycopiedvia

9、.Enteranonymousforuserid,ande-mailaddress(email=userhostuserhost/email)forpassword.Thesedatabasescanbefoundbyexecutingcdpub/machine-learning-databases.Notes:1. Werealwayslookingforadditionaldatabases,whichcanbewrittentothesub-directorynamed/incoming.Pleasesendyours,withdocumentat

10、ion.Thanks-SeeDOC-REQUIREMENTSforsuggesteddocumentationprocedures.Presently,mostdatabaseshavethefollowingformat:1instanceperline,nospaces,commasseparateattributevalues,andmissingvaluesaredenotedby”?”.Also,pleasenotifythesitelibrarian()aftermakingadonation.2. IvanBratkorequest

11、edthatthedatabaseshedonatedfromtheLjubljanaOncologyInstitute(e.g.,breast-cancer,lymphography,andprimary-tumor)haverestrictedaccess.Weareallowedtosharethemwithacademicinstitutionsuponrequest.Thesedatabases(likeseveralothers)requireprovidingpropercitationsbemadeinpublishedarticlesthatusethem.Citationr

12、equirementsareineachdatabasescorresponding*.docfile.Toaccessanyofthesedatabases,.Toaidyouindecidingifyouwantanyofthesedatabases,thedocumentationfilesareavailable.3. Anarchiveservermaynowbeusedtorecieveviae-mailfilesinthisrepository.Installedonics,itprovidesemailacc

13、esstofilesinouranonymousftp/uucparea(ftp).Ifpeoplehavenootheraccesstoourarchives,thentheycansendmailto:Commandstotheservermaybegiveninthebody.Somecommandsare:helpsendfindThehelpcommandreplieswithausefulhelpmessage.Ifyoupublishmaterialbasedondatabasesobtainedfromthisrepositor

14、y,then,inyouracknowledgements,pleasenotetheassistanceyoureceivedbyusingthisrepository.Thanks-thiswillhelpotherstoobtainthesamedatasetsandreplicateyourexperiments.Wesuggestthefollowingpseudo-APAreferenceformatforreferringtothisrepository(LaTeXd):Murphy,P.M.,&Aha,D.W.(1994).itUCIRepositoryofmachinelea

15、rningdatabases/mlearn/MLRepository.html.Irvine,CA:UniversityofCalifornia,DepartmentofInformationandComputerScience.PatrickM.Murphy(RepositoryLibrarian)BriefOverviewofDatabasesandDomainTheories:QuickListing:annealing(DavidSterlingandWrayBuntine)ArtificialCharactersDatabase&DT(don

16、atedbyAttilioGiordana)3-4.audiology(RayBareissandBrucePorter,usedinProtos)1. OriginalVersion2. Standardized-AttributeVersionoftheOriginal.5. auto-mpg(fromCMUStatLiblibrary)autos(JeffSchlimmer)badges(HaymHirsh)balance-scale(TimHume)balloons(MichaelPazzani)breast-cancer(LjubljanaInstituteofOntcology,r

17、estrictedaccess)breast-cancer-wisconsin(WisconsinBreastCancerDbase,OlviMangasarian)1. Originalversion2. DiagnosticdatasetPrognosticdatasetbridges(YoramReich)13-21.chess1. PartialgeneratorofQuinlanschess-end-gamedata(kr-vs-kn)(Schlimmer)2. Shapirosendgamedatabase(kr-vs-kp)(RobHolte)3. king-rook-vs-ki

18、ng(MichaelBain,ArthurvanHoff)4-9.Sixdomaintheories(NickFlann)BachChorales(time-series)database(DarrellConklin)Connect-4Database(JohnTromp)24-25.CreditScreeningDatabase1. JapaneseCreditScreeningDataanddomaintheory(ChiharuSano)CreditCardApplicationApprovalDatabase(RossQuinlan)Ein-DorandFeldmesserscpu-

19、performancedatabase(DavidAha)DiabetesData(SerdarUckun,AI-M94)dgp-2datagenerationprogram(PowellBenedict)DocumentUnderstanding(DonatoMalerba)NinesmallEBLdomaintheoriesandexamplesinsub-directoryeblEvlinKinneysechocardiogramdatabase(StevenSalzberg)flags(RichardForsyth)function-finding(CullenSchafers352c

20、asestudies)glass(VinaSpiehler)hayes-roth(fromHayes-RothA2spaper)36-39.heart-disease(RobertDetrano)hepatitis(G.Gong)horsecolicdatabase(MaryMcLeish&MattCecile)(Boston)Housingdatabase(fromCMUStatLiblibrary)ICUdata(SerdarUckun,AIM-94)Imagesegmentationdatabase(CarlaBrodley)ionosphereinformation(VinceSigi

21、llito)iris(R.A.Fisher,1936)isolet(RonColeandMarkFantysdatabasedonatedbyTomDietterich)kinship(J.RossQuinlan)labor-negotiations(StanMatwin)50-51.led-display-creator(fromtheCARTbook)lenses(CendrowskasdatabasedonatedbyBenoitJulien)letter-recognitiondatabase(createdanddonatedbyDavidSlate)liver-disorders(

22、BUPAMedicalsdatabasedonatedbyRichardForsyth)logic-theorist(PaulORorke)lungcancer(StefanAeberhard)lymphography(LjubjanaInstituteofOncology,restrictedaccess)58-59.mechanical-analysis(FrancescoBergadano)1. OriginalMechanicalAnalysisDataSetPUMPSDATASET60mobilerobots(donatedbyKlingspor,MorikandRieger)61-

23、64.molecular-biology1. promotersequences(Towell,Shavlik,&Noordewier,domaintheoryalso)splice-junctionsequences(Towell,Noordewier,&Shavlik,domaintheoryalso)2. proteinsecondarystructuredatabase(QianandSejnowski)proteinsecondarystructuredomaintheory(JudeShavlik&RichMaclin)MONKsProblems(donatedbySebastia

24、nThrun)MoralReasonerDatabase(donatedbyJamesWogulis)mushroom(JeffSchlimmer)MUSKdatabases(2)(donatedbyTomDietterich)othellodomaintheory(TomFawcett)PageBlocksClassification(DonatoMalerba)PimaIndiansdiabetesdiagnoses(VinceSigillito)PostoperativePatientdata(JerzyW.Grzymala-Busse)PrimaryTumor(LjubjanaInst

25、ituteofOncology,restrictedaccess)QualitativeStructureActivityRelationships(QSARs)(RossKing)QuadrapedAnimals(JohnH.Gennari)Servodata(RossQuinlan)shuttle-landing-control(BojanCestnik)solarflare(GaryBradshaw)79-80.soybean(fromRyszardMichalskisgroups)spaceshuttledatabases(DavidDraper)spectrometer(Infra-RedAstronomySatelliteProjectDatabase,JohnStutz)SpongeDatabase(IosuneUrizandMartaDomingo)StatlogProjectdatabases(7)(fromRossKing,.)85StudentLoanrelationaldatabase(fromMicha

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论