欢迎来到人人文库网! | 帮助中心 人人文档renrendoc.com美如初恋!
人人文库网
全部分类
  • 图纸下载>
  • 教育资料>
  • 专业文献>
  • 应用文书>
  • 行业资料>
  • 生活休闲>
  • 办公材料>
  • 毕业设计>
  • ImageVerifierCode 换一换
    首页 人人文库网 > 资源分类 > PDF文档下载  

    外文翻译-- An online system for functional relationship.PDF

    • 资源ID:103759       资源大小:269.92KB        全文页数:4页
    • 资源格式: PDF        下载积分:1积分
    扫码快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录 支付宝登录   QQ登录   微博登录  
    二维码
    微信扫一扫登录

    手机扫码下载

    请使用微信 或支付宝 扫码支付

    • 扫码支付后即可登录下载文档,同时代表您同意《人人文库网用户协议》

    • 扫码过程中请勿刷新、关闭本页面,否则会导致文档资源下载失败

    • 支付成功后,可再次使用当前微信或支付宝扫码免费下载本资源,无需再次付费

    账号:
    密码:
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源(1积分=1元)下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    外文翻译-- An online system for functional relationship.PDF

    Anonlinesystemforfunctionalrelationshipanalysisofgenome-widegeneproductsQiangHu,Zheng-GuoZhang*DepartmentofBiomedicalEngineeringInstituteofBasicMedicalSciences,ChineseAcademyofMedicalSciencesSchoolofBasicMedicine,PekingUnionMedicalCollegeBeijing,China*Email:zhangzg126126.comAbstractThoughthefunctionalrelationshipanalysisforgeneproductsisuseful,aconvenientanduser-friendlytooltomeasurethefunctionalsimilarityforgenome-widegeneproductsinmultiplespeciesisstillnotavailable.Wecomputedthefunctionalsimilarityofgeneproductsingenomewideinhuman,mouseandratbasedonouralgorithm.Databaseandwebserviceswerebuiltbasedontheprecomputedsimilarityscores.Oursystemprovidedagroupoftoolstoretrievethefunctionalsimilarityandanalysisthefunctionalrelationshipforgeneproducts.Thewebserviceisfreelyavailableathttp:/bme.pumc.edu.cn/fsim/index.html.I.INTRODUCTIONThefunctionalsimilaritymeasurementforgeneproductsisausefulmethodtoinvestigatetheirrelationship.Oneimportantapplicationoffunctionalsimilarityanalysisistopredictandassesstheprotein-proteininteractions1,2,3.Anotherapplicationistodiscoverthepositionalcandidategenesofdiseases4.Functionalsimilarityalsocanbeusedtoclustergeneexpressiondataforfunctionalrelatedgeneshavesimilarexpressionprofiles5.Mostofmethodstomeasurefunctionalsimilarityarebasedontheannotationinformationofgeneproducts.TheGeneOntology(GO)database6providesacontrolledvocabularyoftermstoannotatethefunctionsofgeneproducts.Itiswidelyadoptedbymostofalgorithmsandtoolstomeasurethefunctionalsimilarity.Thoughmanytoolshavebeendevelopedtomeasurethefunctionalsimilarity,aconvenientanduser-friendlytooltoanalysistherelationshipofgenome-widegeneproductsisstillnotavailable.TheGOtoolswebpagecollectedalotofsoftwarebasedonthedatabase.Forexample,AmiGO7andQuickGO8provideaninterfacetosearchandbrowsetheontologyandannotationdata.Therelationshipofgeneproductscanbecomparedbyusersbutnotautomatically.GOTax9thatintegratedtheannotationdataofproteinandproteinfamiliesprovidedafunctionalsimilaritysearchtool(FSST)basedonthealgorithmofInformationContent(IC)ofGOterms.Thetoolcanbeusedtomeasurethefunctionalsimilarityofproteinsandproteinfamilies.G-SESAME10developedanewalgorithmtomeasurethefunctionalsimilarity.Thewebtoolitofferedonlycanbeusedtomeasurethefunctionalsimilarityoftwogeneproducts.FunSimMat11calculatedthesimilarityofproteinsinUniProtKB12.Awebsearchenginewasdevelopedtoretrievethefunctionalsimilarityofproteins.Itwouldbehelpfulifatoolcouldassistbiologiststocomparethefunctionalrelationshipofinterestedgeneswithwholegenomegeneproducts.However,genome-widerelationshipanalysiscouldnotbecarriedoutinordinarycomputingservers.Itwouldcostdozensofhourseveninhighperformancecluster.Wedevelopedanonlinesystemforfunctionalrelationshipanalysisofgenome-widegeneproducts.Anall-against-allfunctionalsimilaritycomparisonforgenome-widegeneproductsinhuman,mouseandratwerecomputedpreliminarilybasedonouralgorithms.Threedatabaseswerebuilttointegratethesimilarityscoresrespectively.Basedontheprecomputedsimilarityscores,awebsearchenginewasdevelopedtoretrievethesimilarityscoresdireclty.Someotherrelatedtoolsweredevelopedtoextendtheonlinewebservices.Biologistscanusethesystemeasilytoanalyzethefunctionalrelationshipofgenome-widegeneproducts.II.CONSTRUCTIONANDCONTENTA.DataSetsTherawdataadoptedtocalculatethesimilarityweredirectlyfromtheannotationpackagesofR/Bioconductorproject13,14.Forexample,thepackagesorg.Hs.eg.db,org.Mm.eg.dbandorg.Rn.eg.dbcontainedtheGOannotationdataofgeneproductsinhuman,mouseandratrespectively.ThepackagesweredescribedinthetableI.AlltheseGOrelatedpackageswerebuiltbyBioconductorprojectaccordingtothelatestversionofGOdatabasein2009March.TheannotationdataofprobeIDsofdifferentmicroarrayplatformswerealsofromtheannotationpackagesinBioconductor.B.Implement1)Algorithm:Threedatabasesintegratedallsimilarityscoresofgenome-widegeneproductsinhuman,mouseandratrespectively.Weproposedanovelalgorithmtomeasuretherelationship.Statisticalmodelwasbuiltaccordingtothecommoninformationoftheannotationtermsbetweentwogeneproducts.TheGOprovidedthreestructuredvocabularies(ontologies)todescribegeneproductsintermsoftheirassociatedbiologicalprocesses(BP),cellularcomponents978-1-4244-4713-8/10/$25.00©2010IEEEFig.1.Functionalsimilaritysearchforgeneproducts.TABLEIDATASETSADOPTEDINTHEDATABASESAnnotationpackagesSpiecesRawdataorg.Hs.eg.dbHumanGOannotation;Mappinginformationbetweendistinctidentifications.org.Mm.eg.dbMousedittoorg.Rn.eg.dbRatdittoorg.Hs.sp.dbHumanProteinidentifierstoEntrezIDsorg.Mm.sp.dbMousedittoorg.Rn.sp.dbRatdittoGO.db-GOtermsrelationshipandannotationKEGG.db-AnnotationmapsforKEGGdatabase(CC)andmolecularfunctions(MF).TheGOtermscouldbeconnectedwithchild-parentrelationshipbetweeneachother.ThethreeontologieswerestructuredasDirectedAcyclicGraph(DAG).GOtermswereindifferentlevelsoftheDAG.ThetermslocatedclosetotheleavesofDAGdescribedmorespecificmeanings.Thesetermscontainedmoreinformationthanthetermslocatedclosetotheroot.Wedefinedaparameter,LevelCoefficient(LC),todenotetheweightoftheinformationofaGOterm.TheLCvaluesofleavesweredefinedas1.Fromchildrentoparents,theLCvaluesgraduallydecreasedastheratiooftheirlevelsintheDAG.Ageneusuallywasannotatedbymorethanoneterminthreeontologies.Theinformationofatermshouldalsocontaintheinformationofitsancestorterms.Thus,thecommontermsbetweentwogeneproductscouldbesummarizedtoacontingencytable.TheLCvaluesasinformationweightsoftermscouldbecountedtothecontingencytable.Therefore,therelationshipoftwogeneproductscouldbemeasuredbystatisticallytestingtheagreementofthecontingencytable.WeadoptedKappavaluetotesttheagreement.Furthermore,theZtestwasusedtotestthesignificantofKappavalue.Whentwogeneproductswerefunctionallyrelated,theKappavaluewouldbecloseto1.2)SimilarityScoresComputation:Therearemorethantenthousandsgeneproductsindifferentspecies.All-against-allcomparisonofallgeneproductsrequiredsolargeamountofcomputingpowerthatordinarycomputerscouldnotfinishthecalculation.Thecomputationaltaskwasseparatedintosmalltasksbydividingtheinputdata.Iftheamountofgenome-widegeneproductsisn,theithcalculationtaskwastocalculatethesimilarityscoresbetweentheithgeneproductandtheonesfromthefirsttotheithgeneproducts.DifferentcalculationtaskswereassignedtodifferentCPUsinahighperformancecluster.Thenthecomputationalresultsweresummarizedtoamatrixofsimilarityscores.ParallelprogramsbasedonRlanguageweredevelopedtorealizethecomputation.RpackagesRmpi15andsnow16providedparallelinterfacestoMPIlibraryoftheclusterenvironment.C.DatabasesThreedatabaseswerecreatedtointegratetheprecomputedsimilarityscoresmatricesofallgeneproductsinhuman,mouseandrat.ThescoresincludedKappavaluesandZscoresbetweeneverytwogeneproducts.Forexample,therewere17482humangeneproducts,thenthescorematrixwiththedimensionof17482×17482wouldbestoredinthedatabases.Rlanguage13wereusedtodevelopprogramstoperformthecomputation.Theresultsmatricesweresohugethatitwasdifficulttobestoredinregularrelationaldatabase.Fig.2.Onlinetoolsforfunctionalrelationshipanalysis.Weformattedthelargescorematricesintohundredsofmatriceswithsmallerdimensions.ThenoursystemstoredthematricesdatadirectlyinRbinaryfiles(Rdata).Thevolumeofdatabasefileswasapproximate4gigabytesinsize.ThefiledatabasecouldbeimportedbyRscripts.D.WebsystemThesystemcouldbevisitedthoughinternettoretrieveandanalyzethefunctionalrelationshipofgeneproducts.TheApachehttpserverwasusedtoparsetheHTMLwebpages.Throughthewebserver,theuserscouldsubmittheirdatatothesystemandtheresultswouldbereturnedonthewebpages.Renvironmentwasthebaseofthesystem,whichwasinchargeofdataanalysisandinteractingwiththedatabases.Rapache17asafunctionalmoduleofApache,connectedthewebserverandRenvironment.ThedataandvariablessubmittedbytheuserscouldbetransferredtoRenvironmentviaApache.TheresultsfromRprogramsalsocouldbereturnedtotheusersthroughthewebserver.III.UTILITYANDDISCUSSIONA.WebInterfacesWebinterfacestothedatabaseandanalysistoolsweredeveloped.Asshowninfigure1,ourwebtoolsweredesignedintheconciseanduser-friendlyway.Thesystemprovidedthetoolsoffunctionalsimilaritysearchandclassificationforgeneproducts.Someothertools,suchasgeneenrichmentanalysis,identifierconversionandGOannotation,wereextendedtothesystemtoassistthedataanalysis.DocumentswerealsowrittenintheFAQpagetodescribethetoolsandgiveexamples.B.FunctionalsimilaritysearchforasinglegeneproductThegFSimtoolprovidesafunctiontosearchthemostrelatedgeneproductsforasinglegeneproductinthegenome(Figure1A).SeveralidentifiersofgeneproductsincludingEntrezID,Symbol,UnigeneandSwissProtIDweresupported.Geneproductsinthreespeciesincludinghuman,mouseandratcouldbesearchedinthetool.Thenumberofgeneproductsintheresultscouldbespecified.Thetop100functionallysimilargeneproductswouldbereturnedintheresultsbydefaults.EntrezID,annotatedGOtermsandZscoreswouldbeshowninthesearchresults(Figure1B).GeneproductsannotatedwiththesameGOtermswouldbeputinthesamerow.ThesearchresultscouldalsobedownloadedintheCSV(commaseparatedvalues)formatfile.C.FunctionalsimilarityanalysisforagroupofgeneproductsThegsFSimtoolcouldbeusedtoretrieveandanalyzethefunctionalrelationshipofagroupofgeneproducts(Figure1C).MultipleidentifiersandspeciesofgeneproductsweresupportedinthetoolassameasgFSim.Agroupofformattedgeneproductscouldbesubmittedwiththeseparatorssuchascommas,semicolons,spacesandlinebreaks.AsimilarityscorematrixoftheinputgeneproductswithKappavalueswasshownintheresults.Thesimilarityscorematrixwasalsographicallyvisualized.Aheatmap(Figure1D)demonstratedtheannotatedGOtermsofgeneproducts.ThebluecolorinthegraphdenotedthetheGOtermswereusedtoannotatethecorrespondinggeneproducts.Blackmeantthesetermsdidnotannotatethegeneproducts.Adendrogram(Figure1E)intheresultsshowedthehierarchicalclusteringresultsaccordingtothesimilarityscorematrix.Geneproductswereclassifiedintodifferentgroupsbasedontheirfunctionalrelationship.D.EnrichmentAnalysisGeneenrichmentanalysis18isausefulmethodtodiscoverthespecificfunctionalannotationintheselectedgenesfromthetotal(universe)genes.Asshowninfigure2A,theannotationdatabaseshouldbeselectedfirstly.BP,MFandCContologyofGOdatabaseandKEGGpathwaydatabase19weresupportedinthetool.Thenthep-valueofsignificanttestintheenrichmentanalysisalgorithmcouldbespecified.Thep-valuewas0.05bydefault.Iftheannotationtermwasmorespecificandimportantintheselectedgeneproducts,thetermwouldgetasmallerp-value.Thisvaluecouldbeusedtorestrictthenumberofresults.Iftherewasnoresultintheenrichmentanalysis,abiggerp-valuecouldbeassigned.Agroupofinterestedgeneproductscouldbesubmittedtotheselectedgenes.Theoverallgeneproductsshouldbesubmittedastheuniversegenes.Theanalysisresultsincludethesignificantlyenrichedfunctions,P-values,oddsratio,andannotatedcounts(Figure2B).TheresultscouldalsobedownloadedintheCSVformatfile.Theenrichmentanalysistoolcouldbeusedtoanalysistheresultsoffunctionalsearchforagroupofgeneproducts(gsFSim).E.MicroarrayProbeIDConversionThemicroarrayprobeIDconversiontoolcouldtransfertheprobeIDsfromdifferentmicroaryplatformstoEntrezIDs(Figure2C).Mostofcommercialgenechips,suchasAffymetrix,Agilent,GE(GeneralElectric)andIlluminaweresupported.MicroarrayprobeIDscouldbeconvertedtoEntrezID,thentheIDscouldbesubmittedtotheothertoolstoanalyzethefunctionalrelationship.Therefore,thetoolextendsthesupportedidentifierstypesofgeneproductsinthesystem.F.GOAnnotationAsetofGOtermscouldbesubmittedtotheannotationtooltosearchthedetailedinformationinbatch.AfteragroupofGOtermsweresubmitted,theresultswouldbereturnedincludingthetermnames,definitions,synonymsandLCvaluesindescendingorderofLCvalues.LCdenotedtheweightedinformationofaGOterm.Thusthetermswithmorespecificbiologicalmeaningswouldbeshowninthefrontoftheresults.IV.CONCLUSIONForthepurposeofdevelopingapowerfulanduser-friendlytooltoanalyzethefunctionalrelationshipofgenome-widegeneproducts,wecomputedthefunctionalsimilarityscoresofallgeneproductsinhuman,mouseandratbasedonouralgorithminadvance.Anonlinesystemwasdevelopedonthebaseoftheprecomputedsimilarityscores.Thesystemprovidedagroupoftoolstoretrievethefunctionalsimilarityandanalyzetherelationshipforgenome-widegeneproducts.Ourwebservicesarefreelyavailableathttp:/bme.pumc.edu.cn/fsim/index.html.ACKNOWLEDGMENTThisworkwaspartiallysupportedbyChinaMedicalBoardofNewYork,Inc.#03-787.ThecomputingtasksofsimilarityscorematriceswereperformedintheHighPerformanceComputingCenter,PekingUnionMedicalCollege.REFERENCES1L.J.Lu,Y.Xia,A.Paccanaro,H.Yu,andM.Gerstein,“Assessingthelimitsofgenomicdataintegrationforpredictingproteinnetworks.”GenomeRes,vol.15,no.7,pp.945953,Jul2005.2A.Schlicker,C.Huthmacher,F.Ramrez,T.Lengauer,andM.Albrecht,“Functionalevaluationofdomain-domaininteractionsandhumanpro-teininteractionnetworks.”Bioinformatics,vol.23,no.7,pp.859865,Apr2007.3M.E.Futschik,G.Chaurasia,andH.Herzel,“Comparisonofhumanprotein-proteininteractionmaps.”Bioinformatics,vol.23,no.5,pp.605611,Mar2007.4E.A.Adie,R.R.Adams,K.L.Evans,D.J.Porteous,andB.S.Pickard,“Suspects:enablingfastandeffectiveprioritizationofpositionalcandidates.”Bioinformatics,vol.22,no.6,pp.773774,Mar2006.5Y.QuandS.Xu,“Supervisedclusteranalysisformicroarraydatabasedonmultivariategaussianmixture.”Bioinformatics,vol.20,no.12,pp.19051913,Aug2004.6M.Ashburner,C.A.Ball,J.A.Blake,D.Botstein,H.Butler,J.M.Cherry,A.P.Davis,K.Dolinski,S.S.Dwight,J.T.Eppig,M.A.Harris,D.P.Hill,L.Issel-Tarver,A.Kasarskis,S.Lewis,J.C.Matese,J.E.Richardson,M.Ringwald,G.M.Rubin,andG.Sherlock,“Geneontology:toolfortheunificationofbiology.thegeneontologyconsortium.”NatGenet,vol.25,no.1,pp.2529,May2000.7S.Carbon,A.Ireland,C.J.Mungall,S.Shu,B.Marshall,S.Lewis,A.O.Hub,andW.P.W.Group,“Amigo:onlineaccesstoontologyandannotationdata.”Bioinformatics,vol.25,no.2,pp.288289,Jan2009.8D.Binns,E.Dimmer,R.Huntley,D.Barrell,C.ODonovan,andR.Apweiler,“Quickgo:aweb-basedtoolforgeneontologysearching.”Bioinformatics,vol.25,no.22,pp.30453046,Nov2009.9A.Schlicker,J.Rahnenfhrer,M.Albrecht,T.Lengauer,andF.S.Domingues,“Gotax:investigatingbiologicalprocessesandbiochemicalactivitiesalongthetaxonomictree.”GenomeBiol,vol.8,no.3,p.R33,2007.10Z.Du,L.Li,C.-F.Chen,P.S.Yu,andJ.Z.Wang,“G-sesame:webtoolsforgo-term-basedgenesimilarityanalysisandknowledgediscovery.”NucleicAcidsRes,vol.37,no.WebServerissue,pp.W345W349,Jul2009.11A.SchlickerandM.Albrecht,“Funsimmat:acomprehensivefunctionalsimilaritydatabase.”NucleicAcidsRes,vol.36,no.Databaseissue,pp.D434D439,Jan2008.12U.Consortium,“Theuniversalproteinresource(uniprot)2009.”NucleicAcidsRes,vol.37,no.Databaseissue,pp.D169D174,Jan2009.13RDevelopmentCoreTeam,“R:Alanguageandenvironmentforstatisticalcomputing,”2009,ISBN3-900051-07-0.Online.Available:http:/www.R-project.org14R.C.Gentleman,V.J.Carey,D.M.Bates,B.Bolstad,M.Dettling,S.Dudoit,B.Ellis,L.Gautier,Y.Ge,J.Gentry,K.Hornik,T.Hothorn,W.Huber,S.Iacus,R.Irizarry,F.Leisch,C.Li,M.Maechler,A.J.Rossini,G.Sawitzki,C.Smith,G.Smyth,L.Tierney,J.Y.H.Yang,andJ.Zhang,“Bioconductor:Opensoftwaredevelopmentforcomputationalbiologyandbioinformatics,”GenomeBiology,vol.5,p.R80,2004.15H.Yu,Rmpi:Interface(Wrapper)toMPI(Message-PassingInterface),2007,rpackageversion0.5-5.Online.Available:http:/www.stats.uwo.ca/faculty/yu/Rmpi16L.Tierney,A.J.Rossini,N.Li,andH.Sevcikova,snow:SimpleNetworkofWorkstations,2004,rpackageversion0.3-0.Online.Available:http:/www.sfu.ca/sblay/R/snow.html17J.Horner,rapache:WebapplicationdevelopmentwithRandApache.,2009.Online.Available:http:/biostat.mc.vanderbilt.edu/rapache/18A.Alexa,J.Rahnenfhrer,andT.Lengauer,“Improvedscoringoffunctionalgroupsfromgeneexpressiondatabydecorrelatinggographstructure.”Bioinformatics,vol.22,no.13,pp.16001607,Jul2006.19M.Kanehisa,“Thekeggdatabase.”NovartisFoundSymp,vol.247,pp.91101;discussion1013,11928,24452,2002.

    注意事项

    本文(外文翻译-- An online system for functional relationship.PDF)为本站会员(上***)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    网站客服QQ:2881952447     

    copyright@ 2020-2024  renrendoc.com 人人文库版权所有   联系电话:400-852-1180

    备案号:蜀ICP备2022000484号-2       经营许可证: 川B2-20220663       公网安备川公网安备: 51019002004831号

    本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知人人文库网,我们立即给予删除!