会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF外文资料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF -- 1 元

宽屏显示 收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

CADViSTVisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstract–Explorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentationpreviouslyproposedbytheauthorsisemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchToolBLAST.Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.KeywordsBLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchToolBLASTisoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransformFTandZCurveapproaches.1BasevectorapproachHamori,E.andRuskin,J.1983representedDNAsequencesinathreedimensionalcurveHCurve1.Gates,M.A.1985proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanHCurve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenineA,thymineT,cytosineC,andguanineG.TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativeyaxis,baseTisonthepositiveyaxis,baseGisonthepositivexaxis,andbaseCisonthenegativexaxis2.Aboutelevenyearslater,NandyA.1996proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGTaxissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.–T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.ThefirstquadrantrepresentspyrimidineTandC,andtheforthquadrantrepresentspurineAandG4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.2SequentialapproachAltschuletal.,1990developedtheBasicLocalAlignmentSearchToolBLASTprogram.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebasebybasebasis6.3FourierTransformFTapproachAnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.SpectrogramswhichcanrepresentthemagnitudeoftheshorttimeFouriertransformSTFTisimplementedviathediscreteFouriertransformDFT.AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransformFTusesthe3periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolorred,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.4ZCurveapproachZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZCurve812.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZCurvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongHbond/weakHbond.SincethealgorithmofZCurveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZCurvecalled3DDCurveispresentedbyZhangY.andTanM.2008.ThisapproachcanbeviewedastheweightedversionofZCurve14.9781424447138/10/25.00©2010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequencesneucleotidesequencesareemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVISTTHEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollowsA.BasicLocalAlignmentSearchToolBLASTBLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformationNCBI.Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalueEvalueforeachalignmentpair.ThestandaloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsiteftp//ftp.ncbi.nih.gov/blast/executables/release/.Inthispaper,weemployedstandaloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftwareCADViST.B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.ThefirstquadrantrepresentstheamountofaminoCandAwhilethefourthquadrantrepresentsamountofketoTandG.Theunitvectorsrepresentfournucleotides,i.e.adeninesA,guanineG,thymineT,andcytosineC,aredemonstratedasfollowsFig.1Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinatex,yoftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollowsnullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTinastandalonemodetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstandaloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.ThisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestandaloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.TheformatdbcommandcanbeshownasfollowsformatdbpFiDatabaseName.fastaThesourceFASTAfilewillhavetheformFirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodesA,T,GorC.Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequenceseachserotypecontains952,737,405and90nucleotidesequences,respectively.E.StandaloneexecutableBLASTThestandaloneexecutableBLASTandNCBIwebbasedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystandaloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollowblastallpblastndDatabasename.fastaiQuerySequence.fastam9FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabetsFigs.4band4clikeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeatureUnitXofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninCprogramming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither1thetextfileinFASTAformator2textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollowsFig.2Step1CallstandaloneBLASTprogramtogenerateBLASToutput,Step2ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4de.Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype32040–7143basepositionandourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstandaloneBLASTprogramconsistoftwomajorparts,i.e.1theonelinedescriptionsofeachdatabasesequencefoundtomatchthequerysequenceFig.4a,and2thealignmentbetweentheinputsequenceandthematchedquerysequencesFigs.4bc16.Figs.4bandcillustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4derepresenttheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.ThefirstpartshowsthewholegenomeofquerysequenceFig.4d.ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLASTFig.4e.InFig.4e,forconvenience,onlythefirstFN429899andsecondAY858038highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4a–c,wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4d–e.InFig.4e,wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequencesFig.4d.Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.Withthe.netframworkofC,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences.TheJournalofBiologicalChemistry,vol.2582,1983,pp.13181327.2M.A.Gates,SimpleDNAsequencerepresentations,Nature,vol.316,1985.3A.Nandy,TwodimensionalgraphicalrepresentationofDNAsequencesandintronexondiscriminationinintronrichsequences.Bioinformatics,vol.121,1996,pp.5562.4S.T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandYK.Ho,DNAsequencerepresentationwithoutdegeneracy,NucleicAcidsResearch,vol.3112,2003,pp.30783080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,UnitXDenguevirussequencegraphicalrepresentationforserotypesclassification,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,Basiclocalalignmentsearchtool,JournalofMolecularBiology,vol.2153,1990,pp.403410.7J.Ye,S.McGinnisandT.L.Madden,BLASTimprovementsforbettersequenceanalysis,NucleicAcidsResearch,vol.34,2006,pp.W6W9.8D.Anastassiou,Genomicsignalprocessing,SignalProcessingMagazine,IEEE,vol.184,2001,pp.820.9NF.Law,KO.ChengandWC.Siu,OnrelationshipofZcurveandFourierapproachesforDNAcodingsequenceclassification,Bioinformation,vol.17,2006,pp.242246.10R.ZhangandCT.Zhang,Zcurves,anintutivetoolforvisualizingandanalyzingtheDNAsequences,JournalofBiomolecularStructureDynamics,vol.11,1994,pp.767782.11CT.Zhang,J.WangandR.Zhang,AnovelmethodtocalculatetheGCcontentofgenomicDNAsequences,JournalofBiomolecularStructureDynamics,vol.192,2001,pp.333341.12CT.Zhang,R.ZhangandHY.Ou,TheZcurvedatabaseagraphicrepresentationofgenomesequences,Bioinformatics,vol.195,2003,pp.593599.13J.J.Cai,DK.Smith,X.XiaandKY.Yuen,MBEToolboxaMatlabtoolboxforsequencedataanalysisinmolecularbiologyandevolution,BMCBioinformatics,vol.644,2005.14Y.ZhangandM.Tan,VisualizationofDNAsequencesbasedon3DDCurves,JournalofMathematicalChemistry,vol.441,2008,pp.206216.15D.Wheeler,NCBINewsHowtoSearchCustomSubsetsofGenBankUsingStandaloneBLAST,Nationalcenterforbiotechnologyinformation,pp.6,Winter1999.16T.Madden,TheBLASTsequenceanalysistool,TheNCBIhandbook,thisarticleisavailablefromhttp//www.ncbi.nlm.nih.gov/bookshelf/br.fcgibookhandbookpartch16.abcdeFigure4.aOnelinedescriptioninBLASTreportbcApairwisesequencealignmentobtainedfromBLASTreportdeGraphicaldisplayofCADViST.
编号:201311201910427491    大小:440.40KB    格式:PDF    上传时间:2013-11-20
  【编辑】
1
关 键 词:
外文资料
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

当前资源信息

4.0
 
(2人评价)
浏览:33次
图纸帝国上传于2013-11-20

官方联系方式

客服手机:17625900360   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

相关资源

相关资源

相关搜索

外文资料  
关于我们 - 网站声明 - 网站地图 - 友情链接 - 网站客服客服 - 联系我们
copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5