医学信息分析-Chapter-2-Genome-alignment-and-assembly-_第1页
医学信息分析-Chapter-2-Genome-alignment-and-assembly-_第2页
医学信息分析-Chapter-2-Genome-alignment-and-assembly-_第3页
医学信息分析-Chapter-2-Genome-alignment-and-assembly-_第4页
医学信息分析-Chapter-2-Genome-alignment-and-assembly-_第5页
已阅读5页,还剩15页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Chapter 2 genome alignment and assemblyQ1如果要把所测的DNA序列与参考基因组进行比较,请你尽可能的列举出所有的难点。SOAP rulesSOAP will allow either a certain number of mismatches or one continuous gap for aligning a read onto the reference sequence.The best hit of each read which has minimal number of mismatches or smaller gap will be

2、 reported. For multiple equal best hits, the user can instruct the program to report all, or randomly report one, or disregard all of them.SOAP rulesthe program will allow at most two mismatches.occurrence of single nucleotide polymorphism is much higher than that of small insertions or deletions, s

3、o ungapped hits have precedence over gapped hits.For gapped alignment only one continuous gap with a size ranging from 1 to 3 bp is accepted, while no mismatches are permitted in the flanking regions to avoid ambiguous gaps. SOAP rulesSOAP can iteratively trim several basepairs at the 3-end and redo

4、 the alignment, until hits are detected or the remaining sequence is too short for specific alignment.SOAP (Pair-end sequencing)Pair-end sequencing means to sequence both ends of a DNA fragment. So the two reads belonging to a pair will always have the settled relative orientation and approximate di

5、stance between each other on the genome.A pair will be aligned when two reads are mapped with the right orientation relationship and proper distance.A certain number of mismatches are allowed in one or both reads of the pair. For gapped alignment, gap is only permitted on one read, and the other end

6、 should match exactly.SOAPoptionsInputquery a file, *.fq or *.fa formatreference sequences file, *.fa formatoutput alignment fileseed size, default=10. read18,s=8; read22,s=10, read26, s=12maximum number of mismatches allowed on a read, =5. default=2bpmaximum gap size allowed on a read, default=0bpS

7、OAPoptionsInputmaximum number of equal best hits to count, smaller will be faster, n Ns, default=5how to report repeat hits, 0=none; 1=random one; 2=all, default=1do alignment on which reference chain? 0:both; 1:forward only; 2:reverse only. default=0number of processors to use, default=1SOAP(Option

8、s for pair-end alignment:)Inputquery b fileminimal insert size allowed, default=400maximal insert size allowed, default=600output file of unpaired alignment hitsSOAP(Options for miRNA alignment:)Input3-end adapter sequence, default=not miRNA number of mismatch allowed in adapter, default=0 minimum l

9、ength of a miRNA, default=17 maximum length of a miRNA, default=26 SOAP(Format of output)Outputid of read; full sequence of read. the read will be converted to the complementary sequence if mapped on the reverse chain of reference; number of equal best hits.length of the read, if aligned after trimm

10、ing, it will report the information of trimmed read; alignment on the direct(+) or reverse(-) chain of the reference;SOAP(Format of output)Outputid of reference sequence;location of first bp on the reference, counted from 1; type of hits.0: exact match.1100 RefAllele-OffsetQueryAlleleQual: number of

11、 mismatches, followed by detailed mutation sites and switch of allele types. Offset is relative to the initial location on reference.OffsetAlleleQual: offset, allele, and quality.FeaturesThere are limitations to the alignment approach, such as placing reads within repetitive regions in the reference

12、 genome or in corresponding regions that may not exist in the reference genome; the latter situation may result from gaps in the reference genome or the presence of structural variants (SVs) in the genome being analysed.mate-pair reads can resolve the correct genome assignment for some repetitive re

13、gions as long as one read in the pair is unique to the genomeIndexFor the readsFor the ReferenceFor bothBased on hash tablesBased on suffix treesBased on merge sortingAlgorithms based on hash tablesATCCGAATCCCATGGGAAAGCTATACGTTAGGCCATGCCATGACHash TableK-mer (11) PositionATCCGAATCCCGGCAHDe novo assembliessubstantial challenges exist for their application to

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论