版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Computing with Whole GenomesStuart M. BrownResearch Computing, NYU School of MedicineThe Human Genome ProjectGenome SequencingThe ability to sequence entire genomes has created a huge demand for bioinformaticsSimple data management for the sequencing projectsGenome assemblyAnnotationPublic access to
2、 the dataNew types of whole genome analysesGenome sequencing factories churn out raw sequence data at an ever increasing rateFewer scientists are involved in generating data and more are involved in data analysisSequence Pipeline Laboratory Information Management - track samples, store raw dataAssem
3、ble fragmentsTrack orientation and distance for paired reads from libraries of known sized clonesFind genesGene prediction algorithmsMap known genes and cDNAsAnnotation and public access to dataRaw Genome Data:Finding genes in genome sequence is not easy About 1% of human DNA encodes functional gene
4、s. Genes are interspersed among long stretches of non-coding DNA.Repeats, pseudo-genes, and introns confound mattersThe next step is obviously to locate all of the genes and describe their functions. This will probably take another 15-20 years!UCSCGene Prediction Works PoorlyAlgorithms are not accur
5、atenon-consensus splice siteswhere is the true first 5 exon?cDNA data is incomplete and confusingtruncated cDNA sequencesreal alternative splicingPseudo-genes and true gene duplicationvs. Mistakes in the genome assemblyEnsembl at EBI/EMBLIntegrate With other Genetic DatasetsCytogenetic and molecular
6、 markers(STS, microsatellites, radiation hybrids)Known mutationsOMIM for humansHuge collection of mouse genetic dataNearly complete collection of yeast mutantsSNPsGene ExpressionII. GenomicsWhat is Genomics?An operational definition: The application of high throughput automated technologies to biolo
7、gy.A philosophical definition: A wholistic or systems approach to the study of information flow within a cell. A technology created by the availability of the genome sequenceGenomics TechnologiesDNA microarraysgene expression (measure RNA levels)SNP GenotypingPharmacogenomicsProteomicsAlleles are cr
8、eated by mutations in the DNA sequence of one person - which are passed on to their descendantsGenome DiversityHuman Genetic VariationEvery human has essentially the same set of genesBut there are different forms of each gene - known as allelesblue vs. brown eyesgenetic diseases such as cystic fibro
9、sis or Huntingtons disease are caused by dysfunctional allelesSome Diseases Involve Many GenesThere are a number of classic “genetic diseases caused by mutations of a single gene Huntingtons, Cystic Fibrosis, Tay-Sachs, PKU, etc.There are also many diseases that are the result of the interactions of
10、 many genes:asthma, heart disease, cancerEach of these genes may be considered to be a risk factor for the disease.Groups of SNP markers may be associated with a disease without determining mechanismMultiple CausesSome complex (i.e. multi-gene) diseases may actually be caused by any of a group of di
11、fferent genes (multiple causes), but all show the same symptoms.Different diseases with similar symptoms?SNP linkage analysis can identify these sub-populations more efficiently than classical molecular genetic approaches.Clinical Manifestationsof Genetic Variation(All disease has a genetic componen
12、t)Susceptibility vs. resistanceVariations in disease severity or symptomsReaction to drugs (pharmacogenetics)All of these traits can be traced back to particular genes (or sets of genes) So Whats a SNPA mutation that causes a single base change is known as a Single Nucleotide Polymorphism (SNP)SNPs
13、are very common in the human population.there are SNPs located near all genesthey can be used as markersMost of these have no visible effectin regions between genesSNP GenotypingIt is possible to measure many thousands of SNPs simultaneously in a small blood sample from a patientCan compare “genotyp
14、es for SNP markers linked to virtually any traitA human genome can be characterized with a few thousand common SNP markers on a single chipa personal genetic profile SNPs are Very CommonSNPs are very common in the human population.Between any two people, there is an average of one SNP every 1250 bas
15、es.Most of these have no phenotypic effectVenter et al. estimate that only gnl|dbSNP|rs1042574_allelePos=51 total len = 101 |taxid = 9606|snpClass = 1 Length = 101 Score = 149 bits (75), Expect = 3e-33 Identities = 79/81 (97%) Strand = Plus / Plus Query: 1489 ccctcttccctgacctcccaactctaaagccaagcacttt
16、atatttttctcttagatatt 1548 | | |Sbjct: 1 ccctcttccctgacctcccaactctaaagccaagcactttatattttcctyttagatatt 60 Query: 1549 cactaaggacttaaaataaaa 1569 |Sbjct: 61 cactaaggacttaaaataaaa 81If a matchingSNP is found, then it can bedirectly located on the Genome mapSNP markersSNPs can be found that are linked to
17、 any disease alleles.These mutations are likely to be neutral - they have no direct effect on phenotypeLinked SNPs can be used as markers for the disease in diagnostic tests.Closely linked markers rarely separate, a pair of flanking markers almost never do.DNA Diagnostic Testinghereditary diseases -
18、 potential parents, pre-natal, late onset diseasesgenes that predisposes to disease (risk factors)genotyping of infectious agents (bacterial & viral)measure the type and stage of cancer tumorsforensics - using DNA testing to establish identityDirect Medical ApplicationsDiagnosis Type of cancerAggres
19、sive or benign?Monitor treatment outcomeIs a treatment having the desired effect on the target tissue?Pharmacogenomics The use of DNA sequence information to measure and predict the reaction of individuals to drugs.Personalized drugsFaster clinical trialsLess drug side effectsPeople React Differentl
20、y to DrugsSide effectsEffectivenessThere are genes that control these reactionsSNP markers can be used to identify these genesThere are proteins that chemically activate or inactivate drugs.Other proteins can directly enhance or block a drugs activity.There are also genes that control side effectsSo
21、me Gene Products Interact with DrugsSome Examples10% of African Americans have polymorphic alleles of Glucose-6-phosphate dehydrogenase that lead to haemolyitic anemia when they are given the anti-malarial drug primaquine.0.04% of individuals are homozygous for alleles of psedocholineseterase that a
22、re unable to inactivate the muscle relaxant drug succinylcholine, leading to respiratory paralysis.Succinylcholine ToxicityThere are many polymorphic alleles of the N-acetlytransferase (NAT2) gene with reduced (or acclerated) ability to inactivate the drug isoniazid. Some individuals developed perip
23、heral neuropathy in reaction to this drugSome alleles of the NAT2 gene are also associated with succeptibility to various forms of cancerIsoniazid MetablolismCytochrome P45010% of the Caucasian population is homozygous for alleles of the Cytochrome P450 gene CYP2D6 that do not metabolize the hyperte
24、nsion drug debrisoquine, which can lead to dangerous vacular hypotension.ACEPatients homozygous for an allele with a deletion in intron 16 of the gene for angiotensin-converting enzyme (ACE) showed no benefit from the hypertension drug enalapril while other patients did benefit.These drug response p
25、henotypes are associated with a set of specific gene alleles.Identify populations of people who show specific responses to a drug.In early clinical trials, it is possible to identify people who react well and react poorly.Collect Drug Response DataScan these populations with a large number of SNP ma
26、rkers.Find markers linked to drug response phenotypes.It is interesting, but not necessary, to identify the exact genes involved.Make Genetic ProfilesUse the ProfilesGenetic profiles of new patients can then be used to prescribe drugs more effectively & avoid adverse reactions.Can also speed clinica
27、l trials by testing on those who are likely to respond well.Real World ApplicationsMost of the major pharmaceutical companies are currently collecting pharmacogenomic data in their clinical trials.Data is yet to be published.Genetic indications for drug use are still a few years away.Gene Expression
28、 ProfilingSequence bulk cDNAs from different tissuesNCBI CGAP website allows digital differential displaySAGE (sequence short tags from cDNAs)MicroarraysDigital Differential Display cDNA spotted microarraysLink Gene Expression to Genome SequenceIdentify promoter and 5 sequence for a group of co-expr
29、essed genes.Scan for known transcription factor binding sites.Predict new regulatory sites based on common sequence elements.Whole Genome ComparisonsComparative GenomicsUse mouse homologs to find human genescDNAsChromosome scanning for conserved regionsSyntenyUse knockouts to define functionDeep hom
30、ology Metabolic reconstructionConserved Regulatory RegionsIn a syntenic stretch of DNA, the protein coding regions will be conserved, but not the intergenic regionsEXCEPT for important regulatory motifsVISTA website: ://vista/Metabolic ReconstructionIf we know the genome sequence, and
31、 we know the metabolic pathwaysThen we should be able to map genes to the pathways in every organismWIT2 (What is There) is an attempt to do this ://WIT2/How can organisms lack genes that are essential in related groups?EMP DatabaseEnzymes and Metabolic Pathways database (EMP) :/emp.m
32、/2-Oxobutanoate-Isoleucine, 2-Oxoglutarate_Anabolism (NADPH,_NADH)Clusters of Orthologus Groups (COGs)COGs were delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages.Each COG consists of individual proteins or groups of paralo
33、gs from at least 3 lineages and thus corresponds to an ancient conserved domain. A simple COG with two yeast paralogs. YPL040c is the yeast mitochondrial isoleucyl-tRNA synthetase; the bacterial orthologs and that from M. jannaschii are the BeTs for this yeast protein, but the reverse is true only o
34、f the bacterial proteins., For YBL076c (yeast cytoplasmic isoleucyl-tRNA synthetase), the M. jannaschii ortholog is a symmetrical BeT, whereas the bacterial genes are asymmetrical.ProteomicsIdentify all of the proteins in an organismPotentially many more than genes due to alternative splicing and po
35、st-translational modificationsQuantitate in different cell types and in response to metabolic/environmental factorsProtein-protein interactionsProtein-Protein InteractionsMetabolic and regulatory pathwaysTranscription factorsCo-expressionBiochemical data crosslinkingyeast 2-hybrid affinity taggingUseful feedback to genome annotation/protein function and gene expression BIND - The Biomolecular Interaction Network DatabaseImpact on Bioinformatics Genomics produces high-throughput, high-quality data, and bioi
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- AutoCAD机械制图应用教程(2024版)课件 项目2 图形的绘制与编辑
- 长期卧床患者褥疮的预防策略
- 捣谷平台AI创作挑战每月主题参与赢取积分奖励
- 2024-2025学年公务员考试《常识》自我提分评估【模拟题】附答案详解
- 2024-2025学年度燃气职业技能鉴定考前冲刺测试卷(培优A卷)附答案详解
- 2024-2025学年临床执业医师每日一练试卷及完整答案详解(名师系列)
- 2024-2025学年度公务员考试《常识》高频难、易错点题附答案详解(能力提升)
- 2024-2025学年园林绿化作业人员通关考试题库及1套完整答案详解
- 2024-2025学年全国统考教师资格考试《教育教学知识与能力(小学)》预测复习含答案详解(新)
- 2024-2025学年度信阳航空职业学院单招考试文化素质物理复习提分资料及完整答案详解【夺冠系列】
- 胸膜疾病讲解
- LY/T 1278-2011电工层压木板
- GB/T 6422-2009用能设备能量测试导则
- 人工智能的决策支持和智能决策支持系统课件
- 发展汉语初级读写2第一课-一学就会课件
- 红曲的发展与研究课件
- 中国个省级行政区轮廓图
- 微积分学课件:3-1微分中值定理
- 第二语言习得入门完整共7units课件
- 多媒体技术ppt课件(完整版)
- 碳中和承诺对化工意味着什么
评论
0/150
提交评论