欢迎来到人人文库网! | 帮助中心 人人文库renrendoc.com美如初恋!
人人文库网
首页 人人文库网 > 资源分类 > PDF文档下载

外文资料--Searching Single Nucleotide Polymorphism Markers.PDF

  • 资源大小:291.06KB        全文页数:4页
  • 资源格式: PDF        下载权限:游客/注册会员/VIP会员    下载费用:1
游客快捷下载 游客一键下载
会员登录下载
下载资源需要1

邮箱/手机号:
您支付成功后,系统会自动为您创建此邮箱/手机号的账号,密码跟您输入的邮箱/手机号一致,以方便您下次登录下载和查看订单。注:支付完成后需要自己下载文件,并不会自动发送文件哦!

支付方式: 微信支付    支付宝   
验证码:   换一换

友情提示
2、本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

外文资料--Searching Single Nucleotide Polymorphism Markers.PDF

SEARCHINGSINGLENUCLEOTIDEPOLYMORPHISMMARKERSTOCOMPLEXDISEASESUSINGGENETICALGORITHMFRAMEWORKANDABOOSTMODESUPPORTVECTORMACHINEKHANTHARATANEKBOON,SUPHAKANTPHIMOLTARES,ANDCHIDCHANOKLURSINSAPAVIC,DEPARTMENTOFMATHEMATICS,CHULALONGKORNUNIVERSITY,BANGKOK,THAILANDKHANTHARATASTUDENTCHULAACTH,SUPHAKANTPCHULAACTH,ANDLCHIDCHACHULAACTHSISSADESTONGSIMAGENOMEINSTITUTE,NATIONALCENTERFORGENETICENGINEERINGANDBIOTECHNOLOGY,PATHUMTANI,THAILANDSISSADESBIOTECORTHSUTHATFUCHAROENTHALASSEMIARESEARCHCENTER,INSTITUTEOFMOLECULARBIOSCIENCES,MAHIDOLUNIVERSITY,SALAYACAMPUS,NAKHONPATHOM,THAILANDGRSFCMAHIDOLACTHABSTRACTWITHTHEADVENTOFLARGESCALEHIGHDENSITYSINGLENUCLEOTIDEPOLYMORPHISMSNPARRAYS,CASECONTROLASSOCIATIONSTUDIESHAVEBEENPERFORMEDTOIDENTIFYPREDISPOSINGGENETICFACTORSTHATINFLUENCEMANYCOMMONCOMPLEXDISEASESTHESEGENOTYPINGPLATFORMSPROVIDEVERYDENSESNPCOVERAGEPERONECHIPMUCHRESEARCHHASBEENFOCUSINGONMULTIVARIATEGENETICMODELTOIDENTIFYGENESTHATCANPREDICTTHEDISEASESTATUSHOWEVER,INCREASINGTHENUMBEROFSNPSGENERATESLARGENUMBEROFCOMBINEDGENETICOUTCOMESTOBETESTEDTHISWORKPRESENTSANEWMATHEMATICALALGORITHMFORSNPANALYSISCALLEDIFGATHATUSESA“BOOSTMODE”SUPPORTVECTORMACHINESVMTOSELECTTHEBESTSETOFSNPMARKERSTHATCANPREDICTASTATEOFCOMPLEXDISEASESTHEPROPOSEDALGORITHMHASBEENAPPLIEDTOTESTFORTHEASSOCIATIONSTUDYINTWODISEASES,NAMELYCROHNSANDSEVERITYSPECTRUMOFΒ0/HBETHALASSEMIADISEASESTHERESULTSREVEALEDTHATOURPREDICTEDSNPSCANRESPECTIVELYBESTCLASSIFYBOTHDISEASESAT7157AND7106ACCURACYUSING10FOLDCROSSVALIDATIONCOMPARINGWITHTHEOPTIMUMRANDOMFORESTORFANDCLASSIFICATIONANDREGRESSIONTREESCARTTECHNIQUESKEYWORDSSINGLENUCLEOTIDEPOLYMORPHISM;SUPPORTVECTORMACHINE;GENETICALGORITHMIINTRODUCTIONSCIENTISTSHAVELONGBEENINTERESTEDINIDENTIFYINGGENETICFACTORSTHATINFLUENCETHEOCCURRENCEOFCOMPLEXDISEASESWITHTHEADVENTOFPARALLELGENOTYPINGTECHNOLOGY,COSTANDTIMEINFINDINGSNPSARENOTOUTOFREACHLARGECASECONTROLCOHORTSGENERATEDFROMVERYDENSESNPARRAYSDNACHIPCONTAINSDENSEARRAYOFSNPSCHALLENGINGRESEARCHERSTOSEARCHFORSNPSTHATAREASSOCIATEDWITHTHEDISEASESINCONTRASTTOTHESINGLEGENEDISORDERS,THESTATEOFCOMPLEXDISEASESCOULDBETRIGGEREDFROMMULTIPLEGENESWHENEXPOSINGTOCERTAINENVIRONMENTALFACTORS1,2HOWEVER,SEARCHINGFORMULTIPLEMARKERINTERACTIONSFROMALARGEPOOLOFSNPSIMPOSESHIGHCOMPUTATIONALANDMEMORYCOMPLEXITYATECHNIQUEOFSELECTINGSUBSETOFRELEVANTFEATURES,NAMEDFEATURESELECTION3,HASBEENWIDELYUSEDINALMOSTFIELDS,INCLUDINGBIOINFORMATICSTHISTECHNIQUEPROVIDESMOREEFFECTIVEWAYTOIMPROVELEARNINGACCURACYTOUNDERSTANDTHEIMPORTANCEOFTHEFEATURESBYREMOVINGIRREVERENTORREDUNDANTONESIITHEPROPOSEDIFGAMETHODINTHISSECTION,WEINTRODUCEANEWENCODINGMETHODCALLEDIFGAFIG1DEMONSTRATESTHESUMMARYOFTHEIFGAMETHODTHEFIRSTPOPULATIONISCONSTRUCTEDBYOURPROPOSEDINTEGERENCODINGAPPROACHTHEDATAINTHECHROMOSOMEINGENETICALGORITHMGACONTEXTAREREPRESENTEDBYASETOFSELECTEDFEATURESAFTERTHEPOPULATIONISGENERATED,EACHCHROMOSOMEISEVALUATEDBYAFITNESSSCORETHISSCOREISOBTAINEDBYUSINGTHEBOOSTMODESVMAPPROACHTHEN,THEIFGAREGENERATESTHENEXTPOPULATIONBYIFGASELECTION,IFGACROSSOVER,ANDIFGAMUTATIONUNTILATERMINATIONCRITERIONISSATISFIEDATHEINTEGERENCODINGMETHODUTILIZINGGATOPERFORMFEATURESELECTIONCANBEDONEBYCONVERTINGINPUTDATAUSINGBINARYENCODING4THELENGTHOF9781424447138/10/25002010IEEEFIGURE1THEOVERALLIFGAFLOWCHARACHROMOSOMEEQUALSANUMBEROFALLFEATURESTHESIZEOFENCODEDCHROMOSOMECORRESPONDSDIRECTLYTOTHENUMBEROFINPUTFEATURESTHIS,HOWEVER,PRESENTSAPROBLEMDUETOTWOREASONSFIRST,THERUNNINGTIMEHIGHLYDEPENDSONTHELENGTHOFCHROMOSOMESECOND,AGENERALBINARYENCODINGDOESNOTFIXANUMBEROFSELECTEDFEATURESITFIXESONLYTHELENGTHOFTHECHROMOSOMETHEIFGAINTEGERENCODINGMETHODISPROPOSEDTOSOLVETHESEPROBLEMSASSUMETHATACASECONTROLDATAUSEDINTHISSTUDYHAVEMNUMBEROFGENOTYPESLETQIBETHEITHCHROMOSOMEPROCESSEDINTHEALGORITHMTHELENGTHOFQI,DENOTEDBY|QI|,ISSETTOACONSTANTLESSTHANOREQUALTOMTHEN,RANDOM|QI|NUMBERS,REPRESENTTHELOCATIONSTOSELECTTHECORRESPONDINGGENOTYPESFROMAGIVENFEATURESEQUENCEDURINGTHEIFGA,THELENGTHOFEACHCHROMOSOMEISNOTNECESSARILYIDENTICALFOREXAMPLE,SUPPOSEM7,THECHROMOSOMESIZE|QI|ISSETTO3,ANDTHERANDOMLYSELECTEDLOCATIONSARE1,5,AND6SO,THECHROMOSOMEQI{1,5,6}BIFGASELECTIONEACHINDIVIDUALCHROMOSOMEISSELECTEDBASEDONITSFITNESSSCOREINTOAMATINGPOOLBYASTOCHASTICUNIVERSALSAMPLINGMETHODSUS5THEIFGAALSOUSESANELITISMTECHNIQUE6,INWHICHTHENEXTGENERATIONCHROMOSOMEDERIVESFROMTHEBESTCHROMOSOMEINACURRENTGENERATIONCIFGACROSSOVERTHECROSSOVERFUNCTIONOFTRADITIONALGARANDOMLYSELECTSTHERECOMBINATIONPOINTANDSWAPSTHETWOCHROMOSOMESFLANKINGTHISPOINTCROSSOVERFROMTHEORIGINALGA,HOWEVER,CANNOTBEAPPLIEDTOTHEIFGAAPPROACHBECAUSEALLCHROMOSOMESMUSTHAVETHESAMESIZEANDFEATURESFROMTHESAMELOCICANNOTBEONTHESAMECHROMOSOMEWEMUSTDEVISEANIFGACROSSOVERTECHNIQUETOOVERCOMETHISPROBLEMASSUMETHAT,PARENT1ANDPARENT2ARETHEPARENTALCHROMOSOMESWHEREEACHLOCUSISTHEPOSITIONOFSELECTEDFEATUREEITHERNUMBEROFPARENT1SORPARENT2SLOCUSMUSTBEMORETHAN1NUMBEROFBOTHPARENTSLOCIPARENT1ANDPARENT2MUSTBEGREATERTHANOREQUALTOONEOUTPUTSFROMTHISALGORITHMAREOFFSPRING1SANDOFFSPRING21X←2Y←3TMP1←PARENT14FORI0TO|PARENT1|DO5V←|TMP1|6SEL←RANDOM1,2,,V7X←X∪SEL8TMP1←TMP1–PARENT1SELSUPPRESS9ENDFOR10TMP2←PARENT211FORI0TO|PARENT2|DO12V←|TMP2|13SEL←RANDOM1,2,,V14Y←Y∪SEL15TMP2←TMP2–PARENT2SEL16ENDFOR17C←RANDOM1,MIN|PARENT1|,|PARENT2|–118OFFSPRING1←{X1,X2,,XC,YC1,,Y|PARENT2|}19OFFSPRING2←{Y1,Y2,,YC,XC1,,X|PARENT1|}DIFGAMUTATIONMUTATIONFUNCTIONALTERSTHEVALUEOFASPECIFIEDLOCUSITHARDLYOCCURSWHENCOMPARINGWITHTHECROSSOVERPROCESSIFGAMUTATIONISPRESENTEDHERELETMDENOTETHELENGTHOFAGIVENGENOTYPESEQUENCE,INPUT_CHROMISACHROMOSOMETHATWILLBEMUTATED,ANDOUTPUT_CHROMISAMUTATEDCHROMOSOMEEACHELEMENTINACHROMOSOMEISASELECTEDFEATURE1POS_OUT←RANDOM1,|INPUT_CHROM|2POS_IN←RANDOM1,M3FORI1TO|INPUT_CHROM|DO4IFIPOS_OUTTHEN5OUTPUT_CHROMI←POS_IN6ELSE7OUTPUT_CHROMI←INPUT_CHROMI8ENDIF9ENDFOREGENERATINGAPOPULATIONTHEREARETWOKINDSOFPOPULATION,THEINITIALPOPULATIONANDTHENEXTGENERATIONPOPULATIONTOGENERATETHEINITIALPOPULATIONWITHPCHROMOSOMES,WHEREPISAUSERDEFINEDNUMBEROFCHROMOSOMESINTHEPOPULATION,THEALGORITHMREPEATEDLYGENERATESTHECHROMOSOMESBYINTEGERENCODINGMETHODANDADDSTHEMINTOTHESETOFPOPULATIONUNTILTHENUMBEROFTHECHROMOSOMESINTHEPOPULATIONISEQUALTOPONTHEOTHERHAND,THEPOPULATIONINTHENEXTGENERATIONCONSISTSOFTHECHROMOSOMEB,THEBESTFITNESSSCOREFROMTHECURRENTGENERATION,EGROUPSOFFEATURESFROMEVOLUTION,CROSSOVERANDMUTATION,ANDRGROUPSOFTHEFEATURESFROMTHENEWRESELECTEDFEATURESAFTERADDINGBANDETOTHENEXTGENERATION,THOSECHROMOSOMESARECHECKEDFORREDUNDANCYEACHCHROMOSOMEMUSTBEIDENTICALINTHENEXTGENERATIONDUPLICATEDCHROMOSOMESWILLBEREMOVEDIFTHENUMBEROFCHROMOSOMESINTHENEXTGENERATIONISLESSTHANTHENUMBEROFCHROMOSOMESINTHECURRENTGENERATIONTHENANEWSUBSETSOFFEATURES,R,WILLBERANDOMLYCREATEDANDADDEDTOTHENEXTGENERATIONFTERMINATIONTHISIFGAALGORITHMCONSISTSOFASETOFRECURSIVESTEPSFORGENERATINGTHEPOPULATION,EVALUATIONBYABOOSTMODESVM,IFGASELECTION,IFGACROSSOVER,ANDIFGAMUTATIONTHESESTEPSAREEXECUTEDUNTILTHENUMBEROFTHEBESTRESULTSREMAINSCONSTANTINTHENEXT300ITERATIONSIIITHEPROPOSEDBOOSTMODESVMMETHODTHEGOALOFSVM7ISTOFINDAMAXIMALSEPARATINGHYPERPLANEEITHERFOR1LINEARLYSEPARABLECASEOR2THENONLINEARLYSEPARABLECASENOTEDTHAT,WTISATRANSPOSEVECTOROFWEIGHT,XIISANINPUTVECTOR,ΦISAMAPPINGFUNCTION,ANDBISABIASVALUEYISIGNWΤ⋅XIB1YISIGNWΤ⋅ΦXIB2THESEEQUATIONSFACETHESAMEPROBLEMOCCURREDWHENTHEINPUTDATAAREIMBALANCEDTHELEARNEDSEPARATINGHYPERPLANEFROMIMBALANCEDDATASETMAYSHIFTTOOMUCHINTHEDIRECTIONTOWARDSTHESMALLERGROUPCOMPAREDWITHTHETRUESEPARATINGHYPERPLANE8TOSOLVETHISPROBLEM,THEDECISIONHYPERPLANESHOULDBEADJUSTEDITCANBESEENFROM1AND2THATTHEPARAMETERWEFFECTSTHECLASSIFICATIONOUTPUTSO,MODIFYINGWWILLADJUSTTHEDECISIONHYPERPLANE,WHICHMAYIMPROVETHECLASSIFIERABOOSTMODESVMANEWTECHNIQUEOFOVERSAMPLINGFORNOMINALFEATUREISPROPOSEDTOIMPROVETHEPERFORMANCEOFTHESVMTHEBOOSTMODESVMFIG1GENERATESTWOSVMS,NAMELYSVM1ANDSVM2THESVM1ISCONSTRUCTEDFORGENERATINGTHESCOREOFTHETRAININGDATASETWHEREASTHESVM2ISTHEFINALSVMMODELFORCLASSIFICATIONTHETESTSETFIRST,ONLYTHETRAININGSETISUSEDTOCONSTRUCTTHESVM1ANDTOFINDTHEBOOSTMODETHISBOOSTMODEISTHEINDICATORVECTOROFTHEMINORITYDATASETITISBROUGHTTOTESTWITHTHESVM1TWOSCORINGMETHODS,ANUNBIASEDSCORINGUSANDABIASSCORINGBS,AREPROPOSEDTOFINDTHESCORINGVALUETHEUSMETHODISPERFORMEDWHENTHESVM1CORRECTLYCLASSIFIESTHEBOOSTMODE,OTHERWISETHEBSMETHODISPERFORMEDAFTERTHAT,ASCORINGOVERSAMPLINGAPPROACHSOSISPROCESSEDFORADDINGARTIFICIALDATATOMINORITYGROUPBYSAMPLINGTHEDATAOFTHEMINORITYGROUPUNTILANUMBEROFDATAOFBOTHGROUPSAREEQUALTHEMINORITYGROUPINTHISPAPERMEANSTHEGROUPOFDATAHAVINGFEWERELEMENTSTHENEWSVM2ISCONSTRUCTEDFORTHECLASSIFICATIONBYTHEPREVIOUSTRAININGDATASETANDNEWSETOFDATAFROMTHESOSTECHNIQUEFINALLY,THETESTSETISRUNINTHESVM2FORTHEEVALUATIONTHEERRORRATEFORTHETESTSETISTHEFITNESSSCOREVALUEUSINGINTHEIFGASECTIONABOVEBFINDINGTHEBOOSTMODETOBALANCETHESIZEOFDATAFROMBOTHCLASSES,SOMEADDITIONALDATAINTHEMINORITYGROUPMUSTBEGENERATEDTHESELECTEDGENERATINGMETHODEITHERUSORBSWILLDEPENDUPONABOOSTMODEVECTORTHEFOLLOWINGPROCEDUREDESCRIBESHOWTOCOMPUTETHEBOOSTMODEVECTORLETNMINORBETHENUMBEROFDATAINTHEMINORITYGROUPBOOSTRAPSAMPLINGWITHREPLACEMENTISAPPLIEDONTHEMINORITYGROUPTOGENERATETDATASETS,IE{BOOSTGROUP1,,BOOSTGROUPT}EACHBOOSTGROUPICONTAINSNMINORDATA1FORI1TOTDO2ALLMODEI←MODEBOOSTGROUPI3ENDFOR4BOOSTMODE←MODEALLMODEIICTHEUNBIASEDSCORINGMETHODTHISTECHNIQUEISPROCESSEDWHENTHESVM1CLASSIFIESTHEBOOSTMODECORRECTLYALLDATAPOINTSHAVEEQUALCHANCESEQUALSCORINGVALUESTOBESELECTEDFORTHEOVERSAMPLINGTECHNIQUETHEFOLLOWINGALGORITHMDESCRIBESTHEPROCESSOFFINDINGTHESCORINGVALUEBYTHEUSTECHNIQUETHESCOREVALISANOUTPUTFROMTHISALGORITHM1FORI1TONMINORDO2SCOREVALI1/NMINOR3ENDFORDTHEBIASSCORINGMETHODTHEBSTECHNIQUEISRUNWHENTHESVM1INCORRECTLYCLASSIFIESBYTHEBOOSTMODETHESCORINGVALUEISCALCULATEDFROMTHEDISTANCEOFITSPOINTTOTHEDECISIONHYPERPLANEBY3FORLINEARSEPARABILITYOR4FORNONLINEARSEPARABILITYDISTANCEIWΤ⋅XIB3DISTANCEIWΤ⋅ΦXIB4THEDATAPOINTTHATISCORRECTLYCLASSIFIEDHASLESSERCHANCELESSSCORINGVALUETOBESELECTEDFORTHEOVERSAMPLINGPROCESSTHANTHEONETHATISWRONGLYCLASSIFIEDTHEREFORE,INCREASINGINNUMBEROFINCORRECTCLASSIFICATIONSWOULDINFLUENCETHEHIGHERCHANCEOFSAMPLESTOBECHOSENANDVICEVERSATHESCORINGVALUEFORTHEBSMETHODISDESCRIBEDBYTHEFOLLOWINGALGORITHMLETDISTANCEBEASETOFDISTANCESOFALLDATAPOINTSINTHEMINORITYGROUPTHEOUTPUTFROMTHISALGORITHMISASETOFSCOREVAL1SUMSV

注意事项

本文(外文资料--Searching Single Nucleotide Polymorphism Markers.PDF)为本站会员(图纸帝国)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网(发送邮件至[email protected]或直接QQ联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5