生物信息学课件英文原版课件 (125)_第1页
生物信息学课件英文原版课件 (125)_第2页
生物信息学课件英文原版课件 (125)_第3页
生物信息学课件英文原版课件 (125)_第4页
生物信息学课件英文原版课件 (125)_第5页
已阅读5页,还剩45页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Pathway Bioinformatics-Database, Software, and Discovery,Y. Tom Tang, Ph.D.Bioinformatics R & DHyseq Pharmaceuticals, Inc.Sunnyvale, CA, USA,Outline of the Talk,Introduction to Pathway BioinformaticsOverview Pathmetrics Technology and ProductsData Representation and SLIPR FormatPathway Comparison and Pathway Database SearchesPathway Predictions and Beyond,A Broad Definition of Bioinformatics,InformaticsIts carrier is a set of digital codes and a language. In its manifestation in the space-time continuum, it has utility (e.g. to decrease entropy of an open system).,Bioinformatics The essence of life is information (i.e. from digital code to emerging properties of biosystems.) Bioinformatics is the study of information content of life,Pathways,It can be defined ad a modular unit of interacting molecules to fulfill a cellular function. It is usually represented by a 2-D diagram with characteristic symbols linking the protein and non-protein entities.,A circle indicates a protein or a non-protein biomolecule. An symbol in between indicates the nature of molecule-molecule interaction.,An Example of a Pathway-EPO (erythropoeitin) pathways,Pathway Database -Increasing Level of Complexity,The genome4 bases3 billion bp total3 billion bp/cell, identicalThe proteome20 amino acids60K genes, 200K proteins 10K proteins/cell; different cells/conditions, different expressionsThe pathome200K reactions20K pathways1K pathways/cell; different cells/conditions, different expressions,Evolutionary Theory of Pathways -A New Field of Theoretical Studies,The most important assumption for sequence informatics is evolutionEvolution principle also applies to pathway informatics From simple to complexDuplication, diversifying, and modular re-useWill provide new view toward fundamental questions toward a unified informatics theory of lifeWhat is life?How does new function arise?How does evolution work? (pathway is the bridge between digital signal and emerging properties)When does life begin (what is the initial set of pathways)?,Data Representation in KEGG,Entity: a molecule or a geneBinary relation: a relation between two entitiesNetwork: a graph formed from a set of related entitiesPathway: metabolic pathway or regulatory pathway,Drosophila melanogaster GenesAccording to the KEGG metabolic and regulatory pathways,Pathway Search by EC | Cpd | Gene | Seq 1st Level | 2nd Level | 3rd Level | Text Search ,Carbohydrate Metabolism Energy Metabolism 2.1 Oxidative phosphorylation PATH:dme00190 2.2 ATP Synthesis PATH:dme00193 2.4 Carbon fixation PATH:dme00710 2.5 Reductive carboxylate cycle (CO2 fixation) PATH:dme00720 2.6 Methane metabolism PATH:dme006802.7 Nitrogen metabolism PATH:dme00910 2.8 Sulfur metabolism PATH:dme00920 Lipid Metabolism Nucleotide Metabolism Amino Acid Metabolism Metabolism of Other Amino Acids Metabolism of Complex CarbohydratesMetabolism of Complex Lipids Metabolism of Cofactors and Vitamins,Introduction to GenMAPP,Gene MicroArray Pathway Profiler by Bruce Conklin at Gladstone Institute, UCSF.GenMAPP is a free computer application designed to visualize gene expression data on maps representing biological pathways and groupings of genes. The main features underlying GenMAPP version 1.0 are:Draw pathways with easy to use graphics toolsMultiple species gene databasesColor genes on MAPP files based on user-imported gene expression data,Two Main Challenges in Post-genomic Age,Data integration: integrate diverse biological information Scientific literature, existing body of knowledge about cellular systemsGenomic sequencesProtein sequences, motifs, and structuresExpression data from microarray, dbEST, and RT-PCRProtein-protein interaction data from large-scale screeningFunctional discovery: assign functions to the 60K+ human genesOnly 5% of known genes have assigned functionWe have no clue what the function for the majority of discovered genesWithout understanding function, no drug discovery can be done in either small molecule, or in biopharmaceuticalsWill be the focus of next 20-years of life-science research,Pathmetrics provides solution on,Data integrationEstablish standard for pathway curation and pathway database designingDevelop pathway databases using existing knowledge in scientific literatureUtilizes dbEST, microarray, and other types of expression dataUtilizes genomic data such as promoter-region similarities,Functional studiesAssign proteins with unknown function into functional pathwaysDetermine which cells those pathways work at what levelBe much more efficient then large-scale random screeningDiscover the majority of pathways and protein functionsDeliver many tissue-specific pathways for pharmaceutical industry,Basic Concepts,Node Protein, peptide, or non-protein biomolecules.ModeThe nature of interaction between two nodes. Qualitative data. PathwayA linked list of interconnected nodes and modes. Represented in either 2-D or 1-D format.Pathway NetworkA network of cellular function and regulation involving interconnected pathways.,Curating Pathway Databases,SLIPR standard for pathway curationRelational database design including diverse information about genes, proteins, expression, and tissuesInput in graphical format, and graphical output displaying,SLIRPP standard for pathway curationSLIPR stands for Semi-LInear Pathway Representation. Like the FastA, it is pronounced as SlipR or Slipir.For linear comparison (homology) and display the alignments, 2-D diagrams of pathways 1-D format. We call the 2-D diagrams graph pathways, and the corresponding 1-D representation semi-linear pathways. One graph pathway may be transformed into multiple semi-linear pathways. But we prefer one-to-one mapping between the 2-D graph or the SLIPR form. The generation of 2-D graph pathways and the corresponding 1-D SLIPR form from scientific literature is called pathway curation. Pathways are curated by trained scientists with expertise on the relevant pathways. In addition to generating the 2-D and 1-D formats, they also have to generate a pathway description file for each pathway they curate (pathway annotation), and a protein file that contains all the proteins in the pathway.,Mode Symbol SpecificationsIt is usually specified by two non-character ASCII symbols.- Direct interaction with direction. Used when there is known direct interactions between two nodes (reverse orientation: Clear interaction, but no direction of information flow (notice, no space within, no letters either). This could happen when more than two proteins are involved to form a large complex.,* Bifurcating members (usually appears only in beginning or ending of a pathway, it can occur in the middle of a pathway only when a pathway bifurcates and immediately folds back, e.g. A-B-*C-*E-F).If a pathway starts to bifurcate in the middle or at the end, one can use a *path_name to record this event. E.g: A-B-(xx)-C-D-*New_path_1-E-*New_path_2.( ) Symbol for non-protein nodes. If the small molecule is uncertain, it can be omitted. If the small molecule is known, its name should be inserted in between, e.g. -(Ca), or (cAMP).All the small molecules should be included inside a set of parentheses, e.g. A1-(Ca)-A2-(Cytidine_Diphosphate_Choline). Symbol for another pathway. The path_id should be within the bracket.When linked to other pathways, the path_ids should be put inside a bracket, e.g. A1-Ca_triggered_path1, A1-Gs_pathway.When an ID is given without a () or , it means it is a protein node,SLIPR Format for Pathway Entries,The format is based on a common sequence format, FASTA. Nodes are linked by modes with no space between them. Bifurcating branches are specified later within the same entry with PATHsub_ID and content. Eg.PW_IDPW_name PW_annotation Source Curator Date SpeciesPr1-Pr2-(Ca)-Pr3=Pr4-*Pr5-*PATHsub_XX-Pr5-(Mg)ZZprPATHsub_XXAA1-AA2(SM1)-AA3AA4Hit 1: Ortholog pathway for: Homo sapiens. With score: 100.00Query:hsa:51144 hsa:2052 hsa:2053 hsa:51004hsa:9420%_id:|1.00| |1.00| |1.00| |1.00| |1.00|Sbjct:gi15082281 gi13097729 gi181395 gi4680659gi13094303Hit 2: Ortholog pathway for: Mus musculus. With score: 65.20Query:hsa:51144 hsa:2052 hsa:2053 hsa:51004hsa:9420%_id: |0.85| |0.88| |0.81|0|0.72|Sbjct:gi3142702gi12857870 gi12832382 -gi12850151Hit 3: Ortholog pathway for: Rattus norvegicus. With score: 65.20Query:hsa:51144 hsa:2052 hsa:2053 hsa:51004hsa:9420%_id: |0.81| |0.88| |0.84|0|0.73|Sbjct:gi4098957 gi207689 gi55930 -gi1226240Hit 4: Ortholog pathway for: Caenorhabditis elegans. With score: 44.20Query:hsa:51144 hsa:2052 hsa:2053 hsa:51004hsa:9420%_id:|0.48| |0.56| |0.42| |0.44| |0.31|Sbjct:gi726418 gi1465805 gi3876864 gi2088820gi13775482,Homolog Pathway Prediction Engines,They are the crown jewels of Pathmetrics software toolsCan predict many novel interactionsUse diverse input data, including sequence data, expression data, and known interaction dataEmploy complex numerical algorithms such as dynamical programming and clustering,Example of Novel Pathway Prediction-predicting novel pathways homologous to the query pathway,Pathway Searches and Pathway Predictions,* SCIM: Similarity coefficient of interacting modes,Gene Discovery vs. Pathway Discovery,Novel Pathways,Confirming Predicted Pathways,We can confirm at expression level predicted pathways using RT-PCRIt will extend content of and add tremendous value to our pathway databasesIt will strengthen our IP positions on many novel predicted pathwaysWe can provide this service to customers for specific tissue typesProtein-level confirmation of important pathways can also be carried out

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论