会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

   首页 人人文库网 > 资源分类 > PDF文档下载

外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF

  • 资源星级:
  • 资源大小:440.40KB   全文页数:4页
  • 资源格式: PDF        下载权限:注册会员/VIP会员
您还没有登陆,请先登录。登陆后即可下载此文档。
  合作网站登录: 微信快捷登录 支付宝快捷登录   QQ登录   微博登录
友情提示
2:本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3:本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF

CADViSTVisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstract–Explorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentationpreviouslyproposedbytheauthorsisemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchToolBLAST.Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.KeywordsBLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchToolBLASTisoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransformFTandZCurveapproaches.1BasevectorapproachHamori,E.andRuskin,J.1983representedDNAsequencesinathreedimensionalcurveHCurve1.Gates,M.A.1985proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanHCurve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenineA,thymineT,cytosineC,andguanineG.TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativeyaxis,baseTisonthepositiveyaxis,baseGisonthepositivexaxis,andbaseCisonthenegativexaxis2.Aboutelevenyearslater,NandyA.1996proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGTaxissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.–T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.ThefirstquadrantrepresentspyrimidineTandC,andtheforthquadrantrepresentspurineAandG4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.2SequentialapproachAltschuletal.,1990developedtheBasicLocalAlignmentSearchToolBLASTprogram.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebasebybasebasis6.3FourierTransformFTapproachAnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.SpectrogramswhichcanrepresentthemagnitudeoftheshorttimeFouriertransformSTFTisimplementedviathediscreteFouriertransformDFT.AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransformFTusesthe3periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolorred,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.4ZCurveapproachZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZCurve812.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZCurvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongHbond/weakHbond.SincethealgorithmofZCurveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZCurvecalled3DDCurveispresentedbyZhangY.andTanM.2008.ThisapproachcanbeviewedastheweightedversionofZCurve14.9781424447138/10/25.00©2010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequencesneucleotidesequencesareemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVISTTHEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollowsA.BasicLocalAlignmentSearchToolBLASTBLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformationNCBI.Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalueEvalueforeachalignmentpair.ThestandaloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsiteftp//ftp.ncbi.nih.gov/blast/executables/release/.Inthispaper,weemployedstandaloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftwareCADViST.B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.ThefirstquadrantrepresentstheamountofaminoCandAwhilethefourthquadrantrepresentsamountofketoTandG.Theunitvectorsrepresentfournucleotides,i.e.adeninesA,guanineG,thymineT,andcytosineC,aredemonstratedasfollowsFig.1Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinatex,yoftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollowsnullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTinastandalonemodetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstandaloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.ThisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestandaloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.TheformatdbcommandcanbeshownasfollowsformatdbpFiDatabaseName.fastaThesourceFASTAfilewillhavetheformFirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodesA,T,GorC.Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequenceseachserotypecontains952,737,405and90nucleotidesequences,respectively.E.StandaloneexecutableBLASTThestandaloneexecutableBLASTandNCBIwebbasedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystandaloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollowblastallpblastndDatabasename.fastaiQuerySequence.fastam9FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabetsFigs.4band4clikeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeatureUnitXofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninCprogramming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither1thetextfileinFASTAformator2textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollowsFig.2Step1CallstandaloneBLASTprogramtogenerateBLASToutput,Step2ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4de.Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype32040–7143basepositionandourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstandaloneBLASTprogramconsistoftwomajorparts,i.e.1theonelinedescriptionsofeachdatabasesequencefoundtomatchthequerysequenceFig.4a,and2thealignmentbetweentheinputsequenceandthematchedquerysequencesFigs.4bc16.Figs.4bandcillustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4derepresenttheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.ThefirstpartshowsthewholegenomeofquerysequenceFig.4d.ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLASTFig.4e.InFig.4e,forconvenience,onlythefirstFN429899andsecondAY858038highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4a–c,wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4d–e.InFig.4e,wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequencesFig.4d.Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.Withthe.netframworkofC,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences.TheJournalofBiologicalChemistry,vol.2582,1983,pp.13181327.2M.A.Gates,SimpleDNAsequencerepresentations,Nature,vol.316,1985.3A.Nandy,TwodimensionalgraphicalrepresentationofDNAsequencesandintronexondiscriminationinintronrichsequences.Bioinformatics,vol.121,1996,pp.5562.4S.T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandYK.Ho,DNAsequencerepresentationwithoutdegeneracy,NucleicAcidsResearch,vol.3112,2003,pp.30783080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,UnitXDenguevirussequencegraphicalrepresentationforserotypesclassification,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,Basiclocalalignmentsearchtool,JournalofMolecularBiology,vol.2153,1990,pp.403410.7J.Ye,S.McGinnisandT.L.Madden,BLASTimprovementsforbettersequenceanalysis,NucleicAcidsResearch,vol.34,2006,pp.W6W9.8D.Anastassiou,Genomicsignalprocessing,SignalProcessingMagazine,IEEE,vol.184,2001,pp.820.9NF.Law,KO.ChengandWC.Siu,OnrelationshipofZcurveandFourierapproachesforDNAcodingsequenceclassification,Bioinformation,vol.17,2006,pp.242246.10R.ZhangandCT.Zhang,Zcurves,anintutivetoolforvisualizingandanalyzingtheDNAsequences,JournalofBiomolecularStructureDynamics,vol.11,1994,pp.767782.11CT.Zhang,J.WangandR.Zhang,AnovelmethodtocalculatetheGCcontentofgenomicDNAsequences,JournalofBiomolecularStructureDynamics,vol.192,2001,pp.333341.12CT.Zhang,R.ZhangandHY.Ou,TheZcurvedatabaseagraphicrepresentationofgenomesequences,Bioinformatics,vol.195,2003,pp.593599.13J.J.Cai,DK.Smith,X.XiaandKY.Yuen,MBEToolboxaMatlabtoolboxforsequencedataanalysisinmolecularbiologyandevolution,BMCBioinformatics,vol.644,2005.14Y.ZhangandM.Tan,VisualizationofDNAsequencesbasedon3DDCurves,JournalofMathematicalChemistry,vol.441,2008,pp.206216.15D.Wheeler,NCBINewsHowtoSearchCustomSubsetsofGenBankUsingStandaloneBLAST,Nationalcenterforbiotechnologyinformation,pp.6,Winter1999.16T.Madden,TheBLASTsequenceanalysistool,TheNCBIhandbook,thisarticleisavailablefromhttp//www.ncbi.nlm.nih.gov/bookshelf/br.fcgibookhandbookpartch16.abcdeFigure4.aOnelinedescriptioninBLASTreportbcApairwisesequencealignmentobtainedfromBLASTreportdeGraphicaldisplayofCADViST.

注意事项

本文(外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF)为本站会员(图纸帝国)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网(发送邮件至[email protected]或直接QQ联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

[email protected] 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5