




已阅读5页,还剩21页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1 Copyright 2010 College of Life Science Capital Normal University 6 多序列比对 3 Multiple Sequence Alignment MSA Copyright 2010 College of Life Science Capital Normal University Outline MSA programs Web logo MEME 2 Copyright 2010 College of Life Science Capital Normal University 1 Web logo Crooks GE Hon G Chandonia JM Brenner SE WebLogoWebLogo A sequence logo generator A sequence logo generator Genome Research 14 1188 1190 2004 Schneider TD Stephens RM 1990 Sequence Sequence Logos A New Way to Display Consensus Logos A New Way to Display Consensus SequencesSequences Nucleic Acids Res 18 6097 6100 Copyright 2010 College of Life Science Capital Normal University 3 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 4 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 5 Copyright 2010 College of Life Science Capital Normal University Entropy From information theory a measure of the unpredictable nature of a set of possible elements The higher the level of variation within the set the higher the entropy Shannon C E 1948 Bell System Tech J 27 379 423 623 656 Copyright 2010 College of Life Science Capital Normal University Entropy where K is a positive constant Shannon C E 1948 Bell System Tech J 27 379 423 623 656 6 Copyright 2010 College of Life Science Capital Normal University Creation of Logos 1 TACGAT TATAAT TATAAT GATACT TATGAT TATGTT H 1 f T 1 log2f T 1 f G 1 log2 G 1 5 6 log2 5 6 1 6 log2 1 6 0 8333 0 2630 0 1666 2 585 0 2192 0 4307 0 6499 Copyright 2010 College of Life Science Capital Normal University Creation of Logos 2 TACGAT TATAAT TATAAT GATACT TATGAT TATGTT 7 Copyright 2010 College of Life Science Capital Normal University Creation of Logos 3 TACGAT TATAAT TATAAT GATACT TATGAT TATGTT Copyright 2010 College of Life Science Capital Normal University 2 MEME MMultiple E Em for MMotif E Elicitation http meme sdsc edu meme4 3 0 intro html References Timothy L Bailey Nadya Williams Chris Misleh and Wilfred W Li MEME discovering and analyzing DNA and protein sequence motifs Nucleic Acids Research Vol 34 pp W369 W373 2006 Timothy L Bailey and Charles Elkan Fitting a mixture model by expectation maximization to discover motifs in biopolymers Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology pp 28 36 AAAI Press Menlo Park California 1994 8 Copyright 2010 College of Life Science Capital Normal University Expectation Maximization Algorithm The EM algorithm consists of two steps which are repeated consecutively Step 1 the expectation step Step 2 the maximization step Copyright 2010 College of Life Science Capital Normal University An example Suppose that there are 10 DNA sequences having very little similarity with each other each about 20 nucleotides long and thought to contain a binding site near the middle 4 residues based on biochemical and genetic evidence To find the most probable location of the binding sites in each of the 10 sequences 9 Copyright 2010 College of Life Science Capital Normal University The Initial Setup of EM Algorithm Copyright 2010 College of Life Science Capital Normal University The Initial Setup of EM Algorithm Columns not in motif provide background frequencies Columns defined by a preliminary alignment of the sequences provide initial estimates of frequencies of bases in each motif column profile 10 Copyright 2010 College of Life Science Capital Normal University Step 1 the expectation step Sequence 1 Use previous estimate of profile and multiply by to calculate probability of motif in this position background probabilities in the remaining positions Psite1 sequence1 Copyright 2010 College of Life Science Capital Normal University Step 1 the expectation step Sequence 1 Psite1 sequence1 11 Copyright 2010 College of Life Science Capital Normal University Step 1 the expectation step Sequence 1 Psite1 sequence1 Psite2 sequence1 Psite17 sequence1 The probability of the best locationthe best location in sequence 1 say at site k is the ratioratio of thisthis site probability at ksite probability at k divided by the other sum of all site probabilitiesthe other sum of all site probabilities P site k in sequence 1 Psitek sequence1 Psite1 sequence1 Psite2 sequence1 Psite17 sequence1 Copyright 2010 College of Life Science Capital Normal University Step 1 the expectation step The probability of the site location in each sequence is then calculated in this manner 12 Copyright 2010 College of Life Science Capital Normal University Step 2 the maximization step Copyright 2010 College of Life Science Capital Normal University Step 2 the maximization step In the maximization step the base frequencies found in the expectation step are used as an updatedupdated estimate of the profile In this case the PSSM are more complete than the initial estimatemore complete than the initial estimate because all possible sites in each of the sequences have been evaluated The expectation and maximization steps are repeatedrepeated until the estimate of the PSSM do not change 13 Copyright 2010 College of Life Science Capital Normal University The MEME Suite http meme sdsc edu meme4 3 0 intro html Copyright 2010 College of Life Science Capital Normal University 14 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 15 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 16 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 17 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 18 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University View BLOCK 19 Copyright 2010 College of Life Science Capital Normal University View FASTA Copyright 2010 College of Life Science Capital Normal University View RAW 20 Copyright 2010 College of Life Science Capital Normal University View PSPM Copyright 2010 College of Life Science Capital Normal University MAST program 21 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University 22 Copyright 2010 College of Life Science Capital Normal University Copyright 2010 College of Life Science Capital Normal University Programs for MSA 23 Co
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年吉林省农产品买卖合同模板
- 2025正式员工劳动合同模板
- 2025保险利益原则对保险合同效力发挥的至关重要性研究
- 供水供暖改造工程方案(3篇)
- 2025年二次供水试卷及答案
- 11-4、施工方案专家论证审查纪要
- 工程业务划分方案范本(3篇)
- 城市照明节能改造与智能控制系统融合报告
- 2025散装水泥供销合同
- 医保培训试题及答案(2025年)
- 感恩教师节幼儿园教师节
- 小学科学新教科版三年级上册全册教案(2025秋新版)
- 病人出入院的护理课件
- 电缆安全小知识培训内容课件
- 苏教版2025-2026秋三年级数学上册教学计划及课时安排
- 【里斯】年轻一代新能源汽车消费洞察与预测 -新物种 新理念 新趋势(2024-2025)
- 2025年综合基础知识题库(含答案)
- 6人小品《没有学习的人不伤心》台词完整版
- 基于MAXIMO的发电行业EAM解决方案
- (完整版)英语能力B级考试课件
- (中英)订购单-Purchase-Order
评论
0/150
提交评论