外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第1页
外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第2页
外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第3页
外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第4页
全文预览已结束

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

CADViST:VisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstractExplorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentation(previouslyproposedbytheauthors)isemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchTool(BLAST).Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.Keywords-BLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchTool(BLAST)isoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransform(FT)andZ-Curveapproaches.(1)Basevectorapproach:Hamori,E.andRuskin,J.(1983)representedDNAsequencesinathreedimensionalcurve(H-Curve)1.Gates,M.A.(1985)proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanH-Curve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenine(A),thymine(T),cytosine(C),andguanine(G).TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativey-axis,baseTisonthepositivey-axis,baseGisonthepositivex-axis,andbaseCisonthenegativex-axis2.Aboutelevenyearslater,NandyA.(1996)proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGT-axissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentspyrimidine(TandC),andtheforthquadrantrepresentspurine(AandG)4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.(2)Sequentialapproach:Altschuletal.,1990developedtheBasicLocalAlignmentSearchTool(BLAST)program.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebase-by-basebasis6.(3)FourierTransform(FT)approach:AnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.Spectrogramswhichcanrepresentthemagnitudeoftheshort-timeFouriertransform(STFT)isimplementedviathediscreteFouriertransform(DFT).AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransform(FT)usesthe3-periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolor:red,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.(4)Z-Curveapproach:ZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZ-Curve8-12.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZ-Curvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongH-bond/weakH-bond.SincethealgorithmofZ-Curveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZ-Curvecalled3DD-CurveispresentedbyZhangY.andTanM.(2008).ThisapproachcanbeviewedastheweightedversionofZ-Curve14.978-1-4244-4713-8/10/$25.002010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequences(neucleotidesequences)areemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVIST:THEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollows:A.BasicLocalAlignmentSearchTool(BLAST)BLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformation(NCBI).Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalue(E-value)foreachalignmentpair.Thestand-aloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsite:/blast/executables/release/.Inthispaper,weemployedstand-aloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftware(CADViST).B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentstheamountofamino(CandA)whilethefourthquadrantrepresentsamountofketo(TandG).Theunitvectorsrepresentfournucleotides,i.e.adenines(A),guanine(G),thymine(T),andcytosine(C),aredemonstratedasfollows(Fig.1):Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinate(x,y)oftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollows:nullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTina“stand-alone”modetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstand-aloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.Thisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestand-aloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.Theformatdbcommandcanbeshownasfollows:formatdb-pF-iDatabaseName.fastaThesourceFASTAfilewillhavetheform:FirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodes(A,T,GorC).Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequences(eachserotypecontains952,737,405and90nucleotidesequences,respectively).E.Stand-aloneexecutableBLASTThestand-aloneexecutableBLASTandNCBIweb-basedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystand-aloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollow:blastall-pblastn-dDatabasename.fasta-iQuerySequence.fasta-m9-FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabets(Figs.4(b)and4(c)likeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeature(UnitX)ofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninC#programming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither(1)thetextfileinFASTAformator(2)textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollows(Fig.2):Step1:Callstand-aloneBLASTprogramtogenerateBLASToutput,Step2:ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3:ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4(d-e).Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype3(20407143baseposition)andourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstand-aloneBLASTprogramconsistoftwomajorparts,i.e.(1)theone-linedescriptionsofeachdatabasesequencefoundtomatchthequerysequence(Fig.4(a),and(2)thealignmentbetweentheinputsequenceandthematchedquerysequences(Figs.4(b)-(c)16.Figs.4(b)and(c)illustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4(d)-(e)representtheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.Thefirstpartshowsthewholegenomeofquerysequence(Fig.4(d).ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLAST(Fig.4(e).InFig.4(e),forconvenience,onlythefirst(FN429899)andsecond(AY858038)highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4(ac),wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4(de).InFig.4(e),wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequences(Fig.4(d).Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.WframworkofC#,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,“Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences”.TheJournalofBiologicalChemistry,vol.258(2),1983,pp.1318-1327.2M.A.Gates,“SimpleDNAsequencerepresentations”,Nature,vol.316,1985.3A.Nandy,“Two-dimensionalgraphicalrepresentationofDNAsequencesandintron-exondiscriminationinintron-richsequences”.Bioinformatics,vol.12(1),1996,pp.55-62.4S.-T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandY-K.Ho,“DNAsequencerepresentationwithoutdegeneracy”,NucleicAcidsResearch,vol.31(12),2003,pp.3078-3080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,“UnitX:Denguevirussequencegraphicalrepresentationforserotypesclassification”,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,“Basiclocalalignmentsearchtool”,JournalofMolecularBiology,vol.215(3),1990,pp.403-410.7J.Ye,S.McGinnisandT.L.M

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论