版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、hg19 (grch37) vs. hg38 (grch38)human genome reference comparisonzuotian tatumdepartment of human geneticsleiden university medical centertimeline grch37: first release: feb 27, 2009 latest patch: jun 28, 2013 (p13) grch38: first release: dec 24, 2013 latest patch: oct 14, 2014 (p1)http:/www.ncbi.nlm
2、./projects/genome/assembly/grc/human/data/content grch37.p13: total bases: 3.23 billion 2.99 billion (without n) n50: 46 million number of alternative loci: 9 non-nuclear genome: no grch38.p2: total bases: 3.21 billion 3.05 billion (without n) n50: 67 million number of alternative loci : 261
3、non-nuclear genome: yes/projects/genome/assembly/grc/human/data/ucsc tracks for grch38 ucsc refseq available since april 2014. ensembl regulatory build available since september 2014. dbsnp 141 available since october 2014. encode and fantom5 track hubs are still not availa
4、ble (nov 2014).new in grch38 release three new sequence files, in addition to the standard assembly files: - gca_000001405.15_grch38_top-level.fna.gz - gca_000001405.15_grch38_no_alt_analysis_set.fna.gz - gca_000001405.15_grch38_full_analysis_set.fna.gz the analysis set files are created to avoid fa
5、lse mapping in ngs alignment pipelines.gca_000001405.15_grch38_top-level.fna.gz all the top-level objects in the full-assembly chromosomes unlocalized scaffolds unplaced scaffolds alternate locus scaffolds mitochondrial genome the sequence identifiers are international sequence database collaboratio
6、n (insdc) accession.versions and the definition lines are genbank style. no sequences have been hard-masked.gca_000001405.15_grch38_no_alt_analysis_set.fna.gz chromosomes from the grch38 primary assembly unit.note: the two par regions on chry have been hard-masked with ns. the chromosome y sequence
7、provided therefore has the same coordinates as the genbank sequence but it is not identical to the genbank sequence. similarly, duplicate copies of centromeric arrays and wgs on chromosomes 5, 14, 19, 21 & 22 have been hard-masked with ns. mitochondrial genome from the grch38 non-nuclear assembl
8、y unit. unlocalized scaffolds from the grch38 primary assembly unit. unplaced scaffolds from the grch38 primary assembly unit. epstein-barr virus (ebv) sequencenote: the ebv sequence is not part of the genome assembly but is included in the analysis set as a sink for alignment of reads that are ofte
9、n present in sequencing samples.gca_000001405.15_grch38_full_analysis_set.fna.gz = gca_000001405.15_grch38_no_alt_analysis_set.fna.gz + alt-scaffolds from the grch38 alt_ref_loci_* assembly unitsalt-loci add complexity to rnaseq quantificationideogram of grch38.p2rnaseq quantification- fragments (re
10、ads) per million per killobase (fpkm/rpkm) values to quantify gene expression- unique mapping onlyanalysis tools do not distinguish allelic duplication from paralogous duplication- non overlapping gene regionsto understand the effect of alt-loci on rnaseq quantification compare alignment of chromoso
11、me 6 mhc region between - hg19 full set with 7 alt-loci - hg38 analysis set without alt-loci sequence content are largely unchanged between hg19 and hg38.mapping/alignment for rnaseqhg19hg38mapped14,655,29914,704,427mappeddiffchr4,9594,017mappedpairproper14,639,26114,690,090mappedpairproperpct92.629
12、2.94total15,805,56115,805,561totalsplice5,060,8295,078,133unmapped1,150,2621,101,134hg19: with alt locihg38: without alt locieffect of alt loci in rnaseq alignments0.010.11101001000100000.010.1110100100010000gene rpkm (hg19)gene rpkm (hg19)c chr_m askedhr_m asked171946xgene rpkm (hg38)distribution o
13、f rpkm differencemajor histocompatibility complex region on chromosome 6hla-ahg19 full set chr6d1hg19 full set chr6_mann_hap4d1hg19 full set chr6_qb1_hap6d1hg19 full set chr6_dbb_hap3d1hla-ahg19 full set chr6hg38 analysis setd1d2d3d1d2d3hla-chg19 full setd1d2d3hg38 analysis setd1d2d3hla-drahg19 full
14、 setd1d2d3hg38 analysis setd1d2d3major histocompatibility complex region on chromosome 6class iiimhc class iii 700kb stretch, 60 genes. the most gene-dense region of the human genome 14% coding 72% transcribed highly conserved only a free have clearly defined and proven functiontnfhg19 full set chr6
15、d1.controld1.treatedhg38 analysis set chr6d1.controld1.treatedhighly variant immune regions retiledlilra3 moved to alt-loci in hg38hg19hg38lilrb2lilra3lilra5 lilrb2 lilra5phantom lilra3lilra3 in hg19intergeniclilrb3lilra4lilrb5gene length calculation we need gene length for calculating rpkm. if alig
16、nment uses alt loci rpkm would be artificially lowered for alt loci genes. if alignment does not alt loci remove alt loci annotations from the official set.need more comprehensive approach to genome variation.assembly model is neither haploid nor diploid analysis tools penalize reads mapping to 1 location do not distinguish allelic duplication from paralogous duplicationa graph structure is a natural way to represent a population-based genome assemblyconclusions rpkm values are highly correlated between hg19 and hg38. analysis set is preferre
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 护理研究论文的写作与学术影响力提升
- 2025-2026学年思政课教学设计大赛中职
- 第1课 美味早餐自己做教学设计小学劳动五年级上册鄂教版《劳动教育》
- 2026年事实租赁合同(1篇)
- 2026年住宅小区保洁外包合同(1篇)
- XX公司北京办公楼装饰装修工程的施工组织方案
- 第1课 隋朝的统一与灭亡教学设计初中历史部编版2017七年级下册-统编版2016
- 2025-2026学年汉堡男孩教案简单
- 医学原虫教学设计中职专业课-病原生物与免疫学基础-医学技术-医药卫生大类
- 2017-2018年七年级心理健康教育 帮助他人收获快乐 教学设计
- 冷作工培训课件
- 员工底薪提成合同模板(3篇)
- 2025年郑州电力高等专科学校单招职业技能考试题库附答案
- 赠从弟其二刘桢课件
- 党的二十届四中全会学习试题
- 肿瘤化疗脑患者注意力缺陷计算机化认知训练方案
- 委托验资合同范本
- 2026年陕西青年职业学院单招职业技能测试题库必考题
- 2025年西安中考历史试卷及答案
- VBSE实训总结与心得体会
- 车间5S知识培训课件
评论
0/150
提交评论