外文资料--Searching Single Nucleotide Polymorphism Markers.PDF外文资料--Searching Single Nucleotide Polymorphism Markers.PDF

收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

SEARCHINGSINGLENUCLEOTIDEPOLYMORPHISMMARKERSTOCOMPLEXDISEASESUSINGGENETICALGORITHMFRAMEWORKANDABOOSTMODESUPPORTVECTORMACHINEKHANTHARATANEKBOON,SUPHAKANTPHIMOLTARES,ANDCHIDCHANOKLURSINSAPAVIC,DEPARTMENTOFMATHEMATICS,CHULALONGKORNUNIVERSITY,BANGKOK,THAILANDKHANTHARATASTUDENTCHULAACTH,SUPHAKANTPCHULAACTH,ANDLCHIDCHACHULAACTHSISSADESTONGSIMAGENOMEINSTITUTE,NATIONALCENTERFORGENETICENGINEERINGANDBIOTECHNOLOGY,PATHUMTANI,THAILANDSISSADESBIOTECORTHSUTHATFUCHAROENTHALASSEMIARESEARCHCENTER,INSTITUTEOFMOLECULARBIOSCIENCES,MAHIDOLUNIVERSITY,SALAYACAMPUS,NAKHONPATHOM,THAILANDGRSFCMAHIDOLACTHABSTRACTWITHTHEADVENTOFLARGESCALEHIGHDENSITYSINGLENUCLEOTIDEPOLYMORPHISMSNPARRAYS,CASECONTROLASSOCIATIONSTUDIESHAVEBEENPERFORMEDTOIDENTIFYPREDISPOSINGGENETICFACTORSTHATINFLUENCEMANYCOMMONCOMPLEXDISEASESTHESEGENOTYPINGPLATFORMSPROVIDEVERYDENSESNPCOVERAGEPERONECHIPMUCHRESEARCHHASBEENFOCUSINGONMULTIVARIATEGENETICMODELTOIDENTIFYGENESTHATCANPREDICTTHEDISEASESTATUSHOWEVER,INCREASINGTHENUMBEROFSNPSGENERATESLARGENUMBEROFCOMBINEDGENETICOUTCOMESTOBETESTEDTHISWORKPRESENTSANEWMATHEMATICALALGORITHMFORSNPANALYSISCALLEDIFGATHATUSESA“BOOSTMODE”SUPPORTVECTORMACHINESVMTOSELECTTHEBESTSETOFSNPMARKERSTHATCANPREDICTASTATEOFCOMPLEXDISEASESTHEPROPOSEDALGORITHMHASBEENAPPLIEDTOTESTFORTHEASSOCIATIONSTUDYINTWODISEASES,NAMELYCROHNSANDSEVERITYSPECTRUMOFΒ0/HBETHALASSEMIADISEASESTHERESULTSREVEALEDTHATOURPREDICTEDSNPSCANRESPECTIVELYBESTCLASSIFYBOTHDISEASESAT7157AND7106ACCURACYUSING10FOLDCROSSVALIDATIONCOMPARINGWITHTHEOPTIMUMRANDOMFORESTORFANDCLASSIFICATIONANDREGRESSIONTREESCARTTECHNIQUESKEYWORDSSINGLENUCLEOTIDEPOLYMORPHISM;SUPPORTVECTORMACHINE;GENETICALGORITHMIINTRODUCTIONSCIENTISTSHAVELONGBEENINTERESTEDINIDENTIFYINGGENETICFACTORSTHATINFLUENCETHEOCCURRENCEOFCOMPLEXDISEASESWITHTHEADVENTOFPARALLELGENOTYPINGTECHNOLOGY,COSTANDTIMEINFINDINGSNPSARENOTOUTOFREACHLARGECASECONTROLCOHORTSGENERATEDFROMVERYDENSESNPARRAYSDNACHIPCONTAINSDENSEARRAYOFSNPSCHALLENGINGRESEARCHERSTOSEARCHFORSNPSTHATAREASSOCIATEDWITHTHEDISEASESINCONTRASTTOTHESINGLEGENEDISORDERS,THESTATEOFCOMPLEXDISEASESCOULDBETRIGGEREDFROMMULTIPLEGENESWHENEXPOSINGTOCERTAINENVIRONMENTALFACTORS1,2HOWEVER,SEARCHINGFORMULTIPLEMARKERINTERACTIONSFROMALARGEPOOLOFSNPSIMPOSESHIGHCOMPUTATIONALANDMEMORYCOMPLEXITYATECHNIQUEOFSELECTINGSUBSETOFRELEVANTFEATURES,NAMEDFEATURESELECTION3,HASBEENWIDELYUSEDINALMOSTFIELDS,INCLUDINGBIOINFORMATICSTHISTECHNIQUEPROVIDESMOREEFFECTIVEWAYTOIMPROVELEARNINGACCURACYTOUNDERSTANDTHEIMPORTANCEOFTHEFEATURESBYREMOVINGIRREVERENTORREDUNDANTONESIITHEPROPOSEDIFGAMETHODINTHISSECTION,WEINTRODUCEANEWENCODINGMETHODCALLEDIFGAFIG1DEMONSTRATESTHESUMMARYOFTHEIFGAMETHODTHEFIRSTPOPULATIONISCONSTRUCTEDBYOURPROPOSEDINTEGERENCODINGAPPROACHTHEDATAINTHECHROMOSOMEINGENETICALGORITHMGACONTEXTAREREPRESENTEDBYASETOFSELECTEDFEATURESAFTERTHEPOPULATIONISGENERATED,EACHCHROMOSOMEISEVALUATEDBYAFITNESSSCORETHISSCOREISOBTAINEDBYUSINGTHEBOOSTMODESVMAPPROACHTHEN,THEIFGAREGENERATESTHENEXTPOPULATIONBYIFGASELECTION,IFGACROSSOVER,ANDIFGAMUTATIONUNTILATERMINATIONCRITERIONISSATISFIEDATHEINTEGERENCODINGMETHODUTILIZINGGATOPERFORMFEATURESELECTIONCANBEDONEBYCONVERTINGINPUTDATAUSINGBINARYENCODING4THELENGTHOF9781424447138/10/25002010IEEEFIGURE1THEOVERALLIFGAFLOWCHARACHROMOSOMEEQUALSANUMBEROFALLFEATURESTHESIZEOFENCODEDCHROMOSOMECORRESPONDSDIRECTLYTOTHENUMBEROFINPUTFEATURESTHIS,HOWEVER,PRESENTSAPROBLEMDUETOTWOREASONSFIRST,THERUNNINGTIMEHIGHLYDEPENDSONTHELENGTHOFCHROMOSOMESECOND,AGENERALBINARYENCODINGDOESNOTFIXANUMBEROFSELECTEDFEATURESITFIXESONLYTHELENGTHOFTHECHROMOSOMETHEIFGAINTEGERENCODINGMETHODISPROPOSEDTOSOLVETHESEPROBLEMSASSUMETHATACASECONTROLDATAUSEDINTHISSTUDYHAVEMNUMBEROFGENOTYPESLETQIBETHEITHCHROMOSOMEPROCESSEDINTHEALGORITHMTHELENGTHOFQI,DENOTEDBY|QI|,ISSETTOACONSTANTLESSTHANOREQUALTOMTHEN,RANDOM|QI|NUMBERS,REPRESENTTHELOCATIONSTOSELECTTHECORRESPONDINGGENOTYPESFROMAGIVENFEATURESEQUENCEDURINGTHEIFGA,THELENGTHOFEACHCHROMOSOMEISNOTNECESSARILYIDENTICALFOREXAMPLE,SUPPOSEM7,THECHROMOSOMESIZE|QI|ISSETTO3,ANDTHERANDOMLYSELECTEDLOCATIONSARE1,5,AND6SO,THECHROMOSOMEQI{1,5,6}BIFGASELECTIONEACHINDIVIDUALCHROMOSOMEISSELECTEDBASEDONITSFITNESSSCOREINTOAMATINGPOOLBYASTOCHASTICUNIVERSALSAMPLINGMETHODSUS5THEIFGAALSOUSESANELITISMTECHNIQUE6,INWHICHTHENEXTGENERATIONCHROMOSOMEDERIVESFROMTHEBESTCHROMOSOMEINACURRENTGENERATIONCIFGACROSSOVERTHECROSSOVERFUNCTIONOFTRADITIONALGARANDOMLYSELECTSTHERECOMBINATIONPOINTANDSWAPSTHETWOCHROMOSOMESFLANKINGTHISPOINTCROSSOVERFROMTHEORIGINALGA,HOWEVER,CANNOTBEAPPLIEDTOTHEIFGAAPPROACHBECAUSEALLCHROMOSOMESMUSTHAVETHESAMESIZEANDFEATURESFROMTHESAMELOCICANNOTBEONTHESAMECHROMOSOMEWEMUSTDEVISEANIFGACROSSOVERTECHNIQUETOOVERCOMETHISPROBLEMASSUMETHAT,PARENT1ANDPARENT2ARETHEPARENTALCHROMOSOMESWHEREEACHLOCUSISTHEPOSITIONOFSELECTEDFEATUREEITHERNUMBEROFPARENT1SORPARENT2SLOCUSMUSTBEMORETHAN1NUMBEROFBOTHPARENTSLOCIPARENT1ANDPARENT2MUSTBEGREATERTHANOREQUALTOONEOUTPUTSFROMTHISALGORITHMAREOFFSPRING1SANDOFFSPRING21X←2Y←3TMP1←PARENT14FORI0TO|PARENT1|DO5V←|TMP1|6SEL←RANDOM1,2,,V7X←X∪SEL8TMP1←TMP1–PARENT1SELSUPPRESS9ENDFOR10TMP2←PARENT211FORI0TO|PARENT2|DO12V←|TMP2|13SEL←RANDOM1,2,,V14Y←Y∪SEL15TMP2←TMP2–PARENT2SEL16ENDFOR17C←RANDOM1,MIN|PARENT1|,|PARENT2|–118OFFSPRING1←{X1,X2,,XC,YC1,,Y|PARENT2|}19OFFSPRING2←{Y1,Y2,,YC,XC1,,X|PARENT1|}DIFGAMUTATIONMUTATIONFUNCTIONALTERSTHEVALUEOFASPECIFIEDLOCUSITHARDLYOCCURSWHENCOMPARINGWITHTHECROSSOVERPROCESSIFGAMUTATIONISPRESENTEDHERELETMDENOTETHELENGTHOFAGIVENGENOTYPESEQUENCE,INPUT_CHROMISACHROMOSOMETHATWILLBEMUTATED,ANDOUTPUT_CHROMISAMUTATEDCHROMOSOMEEACHELEMENTINACHROMOSOMEISASELECTEDFEATURE1POS_OUT←RANDOM1,|INPUT_CHROM|2POS_IN←RANDOM1,M3FORI1TO|INPUT_CHROM|DO4IFIPOS_OUTTHEN5OUTPUT_CHROMI←POS_IN6ELSE7OUTPUT_CHROMI←INPUT_CHROMI8ENDIF9ENDFOREGENERATINGAPOPULATIONTHEREARETWOKINDSOFPOPULATION,THEINITIALPOPULATIONANDTHENEXTGENERATIONPOPULATIONTOGENERATETHEINITIALPOPULATIONWITHPCHROMOSOMES,WHEREPISAUSERDEFINEDNUMBEROFCHROMOSOMESINTHEPOPULATION,THEALGORITHMREPEATEDLYGENERATESTHECHROMOSOMESBYINTEGERENCODINGMETHODANDADDSTHEMINTOTHESETOFPOPULATIONUNTILTHENUMBEROFTHECHROMOSOMESINTHEPOPULATIONISEQUALTOPONTHEOTHERHAND,THEPOPULATIONINTHENEXTGENERATIONCONSISTSOFTHECHROMOSOMEB,THEBESTFITNESSSCOREFROMTHECURRENTGENERATION,EGROUPSOFFEATURESFROMEVOLUTION,CROSSOVERANDMUTATION,ANDRGROUPSOFTHEFEATURESFROMTHENEWRESELECTEDFEATURESAFTERADDINGBANDETOTHENEXTGENERATION,THOSECHROMOSOMESARECHECKEDFORREDUNDANCYEACHCHROMOSOMEMUSTBEIDENTICALINTHENEXTGENERATIONDUPLICATEDCHROMOSOMESWILLBEREMOVEDIFTHENUMBEROFCHROMOSOMESINTHENEXTGENERATIONISLESSTHANTHENUMBEROFCHROMOSOMESINTHECURRENTGENERATIONTHENANEWSUBSETSOFFEATURES,R,WILLBERANDOMLYCREATEDANDADDEDTOTHENEXTGENERATIONFTERMINATIONTHISIFGAALGORITHMCONSISTSOFASETOFRECURSIVESTEPSFORGENERATINGTHEPOPULATION,EVALUATIONBYABOOSTMODESVM,IFGASELECTION,IFGACROSSOVER,ANDIFGAMUTATIONTHESESTEPSAREEXECUTEDUNTILTHENUMBEROFTHEBESTRESULTSREMAINSCONSTANTINTHENEXT300ITERATIONSIIITHEPROPOSEDBOOSTMODESVMMETHODTHEGOALOFSVM7ISTOFINDAMAXIMALSEPARATINGHYPERPLANEEITHERFOR1LINEARLYSEPARABLECASEOR2THENONLINEARLYSEPARABLECASENOTEDTHAT,WTISATRANSPOSEVECTOROFWEIGHT,XIISANINPUTVECTOR,ΦISAMAPPINGFUNCTION,ANDBISABIASVALUEYISIGNWΤ⋅XIB1YISIGNWΤ⋅ΦXIB2THESEEQUATIONSFACETHESAMEPROBLEMOCCURREDWHENTHEINPUTDATAAREIMBALANCEDTHELEARNEDSEPARATINGHYPERPLANEFROMIMBALANCEDDATASETMAYSHIFTTOOMUCHINTHEDIRECTIONTOWARDSTHESMALLERGROUPCOMPAREDWITHTHETRUESEPARATINGHYPERPLANE8TOSOLVETHISPROBLEM,THEDECISIONHYPERPLANESHOULDBEADJUSTEDITCANBESEENFROM1AND2THATTHEPARAMETERWEFFECTSTHECLASSIFICATIONOUTPUTSO,MODIFYINGWWILLADJUSTTHEDECISIONHYPERPLANE,WHICHMAYIMPROVETHECLASSIFIERABOOSTMODESVMANEWTECHNIQUEOFOVERSAMPLINGFORNOMINALFEATUREISPROPOSEDTOIMPROVETHEPERFORMANCEOFTHESVMTHEBOOSTMODESVMFIG1GENERATESTWOSVMS,NAMELYSVM1ANDSVM2THESVM1ISCONSTRUCTEDFORGENERATINGTHESCOREOFTHETRAININGDATASETWHEREASTHESVM2ISTHEFINALSVMMODELFORCLASSIFICATIONTHETESTSETFIRST,ONLYTHETRAININGSETISUSEDTOCONSTRUCTTHESVM1ANDTOFINDTHEBOOSTMODETHISBOOSTMODEISTHEINDICATORVECTOROFTHEMINORITYDATASETITISBROUGHTTOTESTWITHTHESVM1TWOSCORINGMETHODS,ANUNBIASEDSCORINGUSANDABIASSCORINGBS,AREPROPOSEDTOFINDTHESCORINGVALUETHEUSMETHODISPERFORMEDWHENTHESVM1CORRECTLYCLASSIFIESTHEBOOSTMODE,OTHERWISETHEBSMETHODISPERFORMEDAFTERTHAT,ASCORINGOVERSAMPLINGAPPROACHSOSISPROCESSEDFORADDINGARTIFICIALDATATOMINORITYGROUPBYSAMPLINGTHEDATAOFTHEMINORITYGROUPUNTILANUMBEROFDATAOFBOTHGROUPSAREEQUALTHEMINORITYGROUPINTHISPAPERMEANSTHEGROUPOFDATAHAVINGFEWERELEMENTSTHENEWSVM2ISCONSTRUCTEDFORTHECLASSIFICATIONBYTHEPREVIOUSTRAININGDATASETANDNEWSETOFDATAFROMTHESOSTECHNIQUEFINALLY,THETESTSETISRUNINTHESVM2FORTHEEVALUATIONTHEERRORRATEFORTHETESTSETISTHEFITNESSSCOREVALUEUSINGINTHEIFGASECTIONABOVEBFINDINGTHEBOOSTMODETOBALANCETHESIZEOFDATAFROMBOTHCLASSES,SOMEADDITIONALDATAINTHEMINORITYGROUPMUSTBEGENERATEDTHESELECTEDGENERATINGMETHODEITHERUSORBSWILLDEPENDUPONABOOSTMODEVECTORTHEFOLLOWINGPROCEDUREDESCRIBESHOWTOCOMPUTETHEBOOSTMODEVECTORLETNMINORBETHENUMBEROFDATAINTHEMINORITYGROUPBOOSTRAPSAMPLINGWITHREPLACEMENTISAPPLIEDONTHEMINORITYGROUPTOGENERATETDATASETS,IE{BOOSTGROUP1,,BOOSTGROUPT}EACHBOOSTGROUPICONTAINSNMINORDATA1FORI1TOTDO2ALLMODEI←MODEBOOSTGROUPI3ENDFOR4BOOSTMODE←MODEALLMODEIICTHEUNBIASEDSCORINGMETHODTHISTECHNIQUEISPROCESSEDWHENTHESVM1CLASSIFIESTHEBOOSTMODECORRECTLYALLDATAPOINTSHAVEEQUALCHANCESEQUALSCORINGVALUESTOBESELECTEDFORTHEOVERSAMPLINGTECHNIQUETHEFOLLOWINGALGORITHMDESCRIBESTHEPROCESSOFFINDINGTHESCORINGVALUEBYTHEUSTECHNIQUETHESCOREVALISANOUTPUTFROMTHISALGORITHM1FORI1TONMINORDO2SCOREVALI1/NMINOR3ENDFORDTHEBIASSCORINGMETHODTHEBSTECHNIQUEISRUNWHENTHESVM1INCORRECTLYCLASSIFIESBYTHEBOOSTMODETHESCORINGVALUEISCALCULATEDFROMTHEDISTANCEOFITSPOINTTOTHEDECISIONHYPERPLANEBY3FORLINEARSEPARABILITYOR4FORNONLINEARSEPARABILITYDISTANCEIWΤ⋅XIB3DISTANCEIWΤ⋅ΦXIB4THEDATAPOINTTHATISCORRECTLYCLASSIFIEDHASLESSERCHANCELESSSCORINGVALUETOBESELECTEDFORTHEOVERSAMPLINGPROCESSTHANTHEONETHATISWRONGLYCLASSIFIEDTHEREFORE,INCREASINGINNUMBEROFINCORRECTCLASSIFICATIONSWOULDINFLUENCETHEHIGHERCHANCEOFSAMPLESTOBECHOSENANDVICEVERSATHESCORINGVALUEFORTHEBSMETHODISDESCRIBEDBYTHEFOLLOWINGALGORITHMLETDISTANCEBEASETOFDISTANCESOFALLDATAPOINTSINTHEMINORITYGROUPTHEOUTPUTFROMTHISALGORITHMISASETOFSCOREVAL1SUMSV
编号:201311201910477495    类型:共享资源    大小:291.06KB    格式:PDF    上传时间:2013-11-20
  
1
关 键 词:
外文资料
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:外文资料--Searching Single Nucleotide Polymorphism Markers.PDF
链接地址:http://www.renrendoc.com/p-107495.html
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5