(完整版)hg19(GRCh37)与hg38(GRCh38)数据差异比较_第1页
(完整版)hg19(GRCh37)与hg38(GRCh38)数据差异比较_第2页
(完整版)hg19(GRCh37)与hg38(GRCh38)数据差异比较_第3页
(完整版)hg19(GRCh37)与hg38(GRCh38)数据差异比较_第4页
(完整版)hg19(GRCh37)与hg38(GRCh38)数据差异比较_第5页
已阅读5页,还剩27页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、hg19 (grch37) vs. hg38 (grch38)human genome reference comparisonzuotian tatumdepartment of human geneticsleiden university medical centertimeline grch37: first release: feb 27, 2009 latest patch: jun 28, 2013 (p13) grch38: first release: dec 24, 2013 latest patch: oct 14, 2014 (p1)http:/www.ncbi.nlm

2、./projects/genome/assembly/grc/human/data/content grch37.p13: total bases: 3.23 billion 2.99 billion (without n) n50: 46 million number of alternative loci: 9 non-nuclear genome: no grch38.p2: total bases: 3.21 billion 3.05 billion (without n) n50: 67 million number of alternative loci : 261

3、non-nuclear genome: yes/projects/genome/assembly/grc/human/data/ucsc tracks for grch38 ucsc refseq available since april 2014. ensembl regulatory build available since september 2014. dbsnp 141 available since october 2014. encode and fantom5 track hubs are still not availa

4、ble (nov 2014).new in grch38 release three new sequence files, in addition to the standard assembly files: - gca_000001405.15_grch38_top-level.fna.gz - gca_000001405.15_grch38_no_alt_analysis_set.fna.gz - gca_000001405.15_grch38_full_analysis_set.fna.gz the analysis set files are created to avoid fa

5、lse mapping in ngs alignment pipelines.gca_000001405.15_grch38_top-level.fna.gz all the top-level objects in the full-assembly chromosomes unlocalized scaffolds unplaced scaffolds alternate locus scaffolds mitochondrial genome the sequence identifiers are international sequence database collaboratio

6、n (insdc) accession.versions and the definition lines are genbank style. no sequences have been hard-masked.gca_000001405.15_grch38_no_alt_analysis_set.fna.gz chromosomes from the grch38 primary assembly unit.note: the two par regions on chry have been hard-masked with ns. the chromosome y sequence

7、provided therefore has the same coordinates as the genbank sequence but it is not identical to the genbank sequence. similarly, duplicate copies of centromeric arrays and wgs on chromosomes 5, 14, 19, 21 & 22 have been hard-masked with ns. mitochondrial genome from the grch38 non-nuclear assembl

8、y unit. unlocalized scaffolds from the grch38 primary assembly unit. unplaced scaffolds from the grch38 primary assembly unit. epstein-barr virus (ebv) sequencenote: the ebv sequence is not part of the genome assembly but is included in the analysis set as a sink for alignment of reads that are ofte

9、n present in sequencing samples.gca_000001405.15_grch38_full_analysis_set.fna.gz = gca_000001405.15_grch38_no_alt_analysis_set.fna.gz + alt-scaffolds from the grch38 alt_ref_loci_* assembly unitsalt-loci add complexity to rnaseq quantificationideogram of grch38.p2rnaseq quantification- fragments (re

10、ads) per million per killobase (fpkm/rpkm) values to quantify gene expression- unique mapping onlyanalysis tools do not distinguish allelic duplication from paralogous duplication- non overlapping gene regionsto understand the effect of alt-loci on rnaseq quantification compare alignment of chromoso

11、me 6 mhc region between - hg19 full set with 7 alt-loci - hg38 analysis set without alt-loci sequence content are largely unchanged between hg19 and hg38.mapping/alignment for rnaseqhg19hg38mapped14,655,29914,704,427mappeddiffchr4,9594,017mappedpairproper14,639,26114,690,090mappedpairproperpct92.629

12、2.94total15,805,56115,805,561totalsplice5,060,8295,078,133unmapped1,150,2621,101,134hg19: with alt locihg38: without alt locieffect of alt loci in rnaseq alignments0.010.11101001000100000.010.1110100100010000gene rpkm (hg19)gene rpkm (hg19)c chr_m askedhr_m asked171946xgene rpkm (hg38)distribution o

13、f rpkm differencemajor histocompatibility complex region on chromosome 6hla-ahg19 full set chr6d1hg19 full set chr6_mann_hap4d1hg19 full set chr6_qb1_hap6d1hg19 full set chr6_dbb_hap3d1hla-ahg19 full set chr6hg38 analysis setd1d2d3d1d2d3hla-chg19 full setd1d2d3hg38 analysis setd1d2d3hla-drahg19 full

14、 setd1d2d3hg38 analysis setd1d2d3major histocompatibility complex region on chromosome 6class iiimhc class iii 700kb stretch, 60 genes. the most gene-dense region of the human genome 14% coding 72% transcribed highly conserved only a free have clearly defined and proven functiontnfhg19 full set chr6

15、d1.controld1.treatedhg38 analysis set chr6d1.controld1.treatedhighly variant immune regions retiledlilra3 moved to alt-loci in hg38hg19hg38lilrb2lilra3lilra5 lilrb2 lilra5phantom lilra3lilra3 in hg19intergeniclilrb3lilra4lilrb5gene length calculation we need gene length for calculating rpkm. if alig

16、nment uses alt loci rpkm would be artificially lowered for alt loci genes. if alignment does not alt loci remove alt loci annotations from the official set.need more comprehensive approach to genome variation.assembly model is neither haploid nor diploid analysis tools penalize reads mapping to 1 location do not distinguish allelic duplication from paralogous duplicationa graph structure is a natural way to represent a population-based genome assemblyconclusions rpkm values are highly correlated between hg19 and hg38. analysis set is preferre

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论