知识遗传多态性.doc_第1页
知识遗传多态性.doc_第2页
知识遗传多态性.doc_第3页
知识遗传多态性.doc_第4页
知识遗传多态性.doc_第5页
已阅读5页,还剩15页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

遗传多态性知识一、SNP, LD, Haplotype and Tagger SNP1. 遗传/基因多态性(genetic/gene polymorphism)在一随机婚配的群体中,染色体同一基因座位点上有两种或两种以上的基因型,且各个等位基因在群体中的出现频率皆高于1%。它是决定人体对疾病易感性、临床表现多样性及药物治疗反应差异性的重要因素。而种群中频率等于或小于1 %的碱基变异称为突变。染色体同一DNA位置上的每个碱基类型叫做一个等位位点。如某些人的染色体上某一位置的碱基是A,而另一些人的染色体上相同位置上的碱基是G,除性染色体外,每个人体内的染色体都有两份,所以,一个人所拥有的一对等位位点的类型被称作基因型(genotype),如GA、GG、AA;检定一个人的基因型,被称作基因分型(genotyping)。由不同基因型与环境共同作用所产生的生物体(人类)可观测的物理或生理性状称为表现型(phenotype)。限制性片段长度多态性(restriction fragment length polymorphism. RFLP)是第一代的遗传标记;可变数目的串联重复(variable number of tandem repeat. VNTR)是第二代遗传标记;其中重复单位为2-6个核苷酸称为微卫星或短串联重复;6-12个核苷酸称为小卫星。Polymorphisms are defined as frequent (occurring in greater than 1% of the population) variations in the human DNA sequence. Most involve a single base pair substitution, known as single nucleotide polymorphisms(1), although more complex variations are also recognised. SNPs are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s), wherein the least frequent allele has an abundance of 1% or greater. In principle, SNPs could be bi-, tri-, or tetra-alletic polymorphisms. Howere, in humans, tri-alletic and tetra-alletic SNPs are rare almost to the point of non-existence, and so SNPs are sometimes simply referred to as bi-allelic markers. 单核苷酸多态性(single nucleotide polymorphism.SNP):最早由美国麻省理工学院的人类基因组研究中心Lander于1996年提出,是不同个体基因组DNA序列内特定核苷酸位置上单个碱基的不同是第三代遗传标记,任一SNP在群体中出现的频率应不小于1%,原则上SNP可以是双、三、四等位基因多态,在人类三、四等位基因的SNP很少甚至几乎不存在,因此SNP简单指双等位标记,双等位基因的SNP替换包括1个转换CT(GA)和3个颠换CA(GT)、CG(GC)、TA(A T),由于核苷酸的5-甲基胞嘧啶脱氢基反应相对比较频繁,使得四种SNPs在基因组中出现的频率不同,在生物体内约2/3是C/T(G/A)转换,并且多存在于非转录序列中。据统计,人类基因组中3*109碱基中至少存在着1000万个SNPs位点,平均约1个SNP/1000bp。与其他遗传标记(如限制性片段长度多态,短串联重复)的主要不同是不再以“长度”的差异作为检测的手段,而直接以序列的变异作为标记,具有高丰度、高度稳定性和易于自动化分析等独特的优势。英文描述:SNP markers are preferred over microsatellite markers for association studies, because of their high abundance along the human genome (SNPs with minor allele frequency0.1 occur once every 600 kb) (Wang et al.1998), their low mutation rate, and the accessibility of high-throughput genotyping. The power of association studies based on SNPs depends not only on the sample size and density of the marker map but also on many other factors, such as the age and frequency of the disease mutations and SNPs and the extent of linkage disequilibrium(LD) in the region.(2)根据SNP在基因序列中所处的位置的不同,SNP位点可以分为几个大类。大多数对基因的功能没有影响的SNPs,称为anonymous SNPs;存在于基因内部的SNP位点则称为gene-based SNPs,包括内含子、外显子和启动子中的单核苷酸多态性位点。其中,存在于蛋白质编码序列中的SNP位点称为cSNPs或coding SNPs。在cSNPs中,如果不改变所编码的氨基酸序列,这样的单核苷酸多态性称为synonymous SNPs;如果SNP导致了氨基酸序列的改变,则称为non-synonymous SNPs。发生在基因蛋白编码区的SNP,可能引起编码氨基酸的置换,导致蛋白功能的改变;大多数SNPs发生在非编码区,启动子区域的SNP也许影响转录因子结合的能力,改变基因转录的速率或水平;发生在5上游区或3下游区域的SNPs可能改变转录的mRNA的稳定性或增强子活性;而内含子区域的SNPs的功能效应有待于进一步研究(3)。检测SNP的方法多种多样,有直接测序法、PCR-RFLP法、单链构型多态分析法(single strand conformation polymorphism analysis,SSCP)、异源双链分析法(heteroduplex analysis,HA)、变性梯度凝胶电泳分析法(denaturing gradient gel electrophoresis,DGGE)、固相化学断裂法(solid phase chemical cleavage method,spCCM)、等位基因特异性聚合酶链反应法(allele-specific PCR)、DNA芯片检测法和实时荧光定量PCR法等,均具有较高的特异性和敏感性,不同实验室可以根据研究目的和经费选择合适的检测方法。2. 单倍型(haplotype)位于染色体上特定区域、相互关联、倾向于以整体模式遗传给后代的SNPs组合称作单倍型(haplotype),比拟为人类进化历史的“分子化石”。在一段DNA内若存在n个SNP位点,则群体内理论上可能存在2n种单倍型,但针对每一个体来说只有2种单倍型。单倍型构建方法:实验方法目前有单分子稀释法(single-specific dilution)、AP-PCR(allele-specific PCR)、长插入克隆法(Long-insert cloning)与双倍型-单体型转化(diploid-to-haploid conversion)等;统计算法有Clark算法、最大似然算法、贝叶斯算法。3. 单倍域(haplotype block)根据基因组大范围内SNPs之间的连锁不平衡,能够用一个相对简单的模型来描述人类基因组的单倍型结构,即染色体上存在的连续的、稳定的、几乎没有被重组所打断的单倍型区域,称为单倍域(haplotype block or haploblocks)。Several neighboring, tightly linked SNPs are inherited together and form a haplotype block.单倍域可能是遗传的最小单位,在极端情况下,它可以是一个单独的SNP或者是一整条染色体,重组事件频发的区域可将相邻的单倍域间隔开来。3.1 单倍域的定义:a haplotype block is a contiguous set of markers in which the average D(the standardized coefficient of LD(4) is greater than some predetermined threshold. Gabriel et al(5) described human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. based on linkage disequilibrium (LD), that is large pairwise |D| values between those SNP pairs within one haploblock. Patil et al(6) defined haplotype blocks as a region with a large proportion(80%) of inferred common haplotypes.based on the concept of “chromosome coverage” , with a haplotype block containing a minimum number of SNPs that account for a majority of common haplotypes or a reduced level of haplotype diversity.Wang et al(7) further proposed explicit“no historical recombination” as a definition for haplotype blocks, which can be tested using a four-gamete test.Ding K et al(8) choose to define haplotype blocks based on LD when haplotype-block-based tSNPs selection methods were employed. The LD-based haplotype-block definition requires that the proportion of SNP pairs with strong D(absolute D0.70) must account for at least 95% of pairs of SNPs3.2 单倍域的算法及划分标准:3.2.1 基于连锁不平衡:Gabriel Criteria(5) of haplotype block partitioning:v Exclude MAF of SNPs below 0.05 v “strong LD” is defined that if the one-sided upper 95% confidence bound on Dis 0.98 (that is, consistent with no historical recombination) and the lower bound is above 0.7. v “strong evidence for historical recombination” pairs for which the upper confidence bound on D is less than 0.9. We defined a haplotype block as a region over which a very small proportion (80%)of inferred common haplotypes. 提出了获得单体域近似分割的贪婪算法,首先考虑由连续SNPs形成的所有可能的单体域,然后从中选出一个单体域,使得该域中的SNP数目与所需最少的标签SNPs(用来区分的出现一次以上单体型)数目之比值达到最大,也就是用最少的标签SNP区分出最多的SNP;每个SNP都被安排一个单体域中.所有单体域的大小与其在染色体上的顺序无关,且单体域没有绝对的边界。Two criteria:(1) in each block, at least 80% of the observed haplotypes are represented more than once; and (2) the total number of tag SNPs for distinguishing at least 80% of haplotypes is as small as possibleZhang et al(10-11)提出了单体域分割的动态程序算法,算法的原理是使每个单体域中能代表域中大部分性质的标签SNPs达到最少,他们的算法已经被开发为程序HAPBLOCK(http:/ /msms/HapBlock/)。尽管上述方法各具优点,但Wall et al(12)指出更倾向于第一类方法,原因:其一,使用D直接检测历史性重组的发生看起来更符合单体域的定义;其二,对于二倍体的遗传数据,两两配对的方法更容易应用;最后,两两配对连锁不平衡的系数更易于可视化。3.2.3 其余划分标准v haplotype block boundaries were inferred from the phased genotype data (probability threshold for correct phase call at each site: 0.95) by D confidence limits (upper confidence limit 0.97, lower confidence limit 0.70, fraction of informative pairs in strong LD: 0.95) using Haploview (/personal/jcbarret/haploview/) v 所有两两SNP之间的D值最小值0.9(13-14)v 所有两两SNP之间的r2值和D值均等于1(15)v 所有两两SNP之间的r2值最小值0.8(16)v 95%的两两SNP之间的D值最小值0.7(8)Several neighboring, tightly linked SNPs are inherited together and form a haplotype block, which as a haploblock has a higher discrimination power than the individual SNPs within the block. Candidate haplotype blocks were selected from three major populations(Caucasian, East Asian, and African) using the following parameters: maximum match probability reduction=0.85, linkage disequilibrium (LD) r20.7, maximum Fst=0.06(17), minimum number of SNPs=3, minimum heterozygosity=0.2, and minimum number of haplotypes=3.(18)4. 标签SNP(tagger SNP)对于一个连锁群来说其可能包含有很多SNP位点,但是只需用少数几个SNPs就足以特异性地鉴定出该连锁群的单体型模式,而这样的SNPs被称为标签单核苷酸多态性(tag single nucleotide polymorphism,tSNPs),是基因组中具有代表性和特征性的SNP,是构建单倍型或进行关联分析所必需的一组遗传标记。而仅通过少数SNP等遗传标记就可以识别单倍域中的大部分单倍型,这些遗传标记被称为单倍型标SNP,称为单倍型标签SNP(haplotype tag SNP htSNP) (19)。4.1 tSNP和htSNP的区别The two terms, htSNPs and tSNPs, refer to two different strategies(8) for choosing the optimal minimum subset of SNPs from the entire set of SNPs. htSNPs are selected based on the haplotype-block model of LD pattern in a region of interest and represent the common haplotypes inferred from the original set of SNPs. On the other hand, tSNPs are selected based on measures of association, such that a tSNP predicts partially or completely the state of other SNPs. 4.2 挑选tSNP或htSNP方法分类Eight methods can also be classified as haplotype block-based methods: All common haplotypes, Haplotype diversity, R2h (Coefficient of determination), and Entropy and haplotype-block-free methods: TagIT (Haplotype r2), LD r2 (based on pairwise LD), PCA (principal component analysis), and BEST (based on set theory). LD level is based on the following criteria(8): LD level varied from strong LD (D0.8), to moderate LD (0.40.05, (4) functional relevance and importance, (5) reported to dbSNP by various source. Zhang et al(10) 所构建单体域的动态程序中,采用枚举法来选择htSNP。这些方法就是先将染色体分割成连续的单体域,然后通过肉眼观察或程序运算从每个域中选择出可以代表域中多样性的标签SNPs,并要求标签SNPs 的数目达到最小。Johnson et al(21)提出的算法是以连锁不平衡为基础的,原理是首先计算两两SNPs间连锁不平衡程度,如果高度连锁,那么就可以用一个来预测另一个,就是说只选择其中一个作为htSNP。算法依据连锁不平衡的参数,剔除冗余的SNPs,列出所有可能的htSNPs子集,然后根据各组htSNP能够说明样本中单体型变异的多少。(多样解释比例,proportion of diversity explained,PDE)来确定最佳的一组为htSNPs.CIayton算法(22)的原理是让进行基因分型的标签SNPs能够对剩余的不分型SNPs进行很好的预测,依据其原理建立算法选出可能的htSNPs子集,最后同样依据PDE来选出htSNPs.与Johnson的算法不同的是该方法可以按照使用者的要求在PDE分析之前剔除覆盖率达不到要求或有缺失的数据的htSNPs子集。Stram等(23)选择标签SNPs算法就是让标签SNPs能够对总体SNPs的分布进行较好的预测,算法考虑每个个体真实的单体型拷贝数与通过标签SNPs所预测的单体型拷贝数的相关程度,并将相关系数的平方R2(0.7)作为选择htSNPs的参数。In any case, a considerable loss of information about potentially causative variants was associated with all SNP tagging methods. Simply put, more tagSNPs provide more information, and all genotyped SNPs provide the maximum information. While r2 tagging with a cut-off of 0.8 is often considered sufficient to capture most information, there is still a loss of 15% compared to the use of the complete marker set. The portion of captured variants obtained with using all markers is comparable to earlier findings for the ENCODE regions with both lower and higher SNP marker densities.(24)Haplotype block, tag SNPs, Haplotype用途和研究步骤(2):Haplotype blocks, together with the corresponding tag SNPs and common haplotypes determined by haplotype blockpartitioning algorithms, can be used in genomewide association studies, as well as in the finescale mapping of complex disease genes. First, a small number of samples (e.g., 10 or 20 individuals) are chosen to be genotyped at a very dense SNP map in a region, and the haplotypes of these individuals are identified simultaneously. Second, an algorithm for haplotype block partitioning is employed, to identify haplotype block structure and a set of well-spaced tag SNPs. Third, a larger number of samples are genotyped only at these tag SNP marker loci. Fourth, association studies are conducted using all the genotyped samples, with knowledge of the haplotype block structure. 5. 连锁不平衡(linkage disequilibrium LD)由Jinnings在20世纪初期提出,是指同一条染色体上的SNPs之间不是孤立的,不同位点的等位基因往往倾向于同时出现,出现的概率超过人群中因随机分布而使两位点同时出现的概率,又称等位基因关联(allelic associattion) (25)。LD is the “nonrandom association of alleles at different loci”.连锁不平衡现象在群体遗传学参数估计、基因精细定位、关联分析等方面有广泛应用,从本质上讲,关联分析检测的就是遗传标记和性状之间的连锁不平衡。一般来说,连锁不平衡可以从突变、随机漂变、瓶颈效应和群体混合过程中产生,而连锁不平衡随衰减时间和遗传位点间的遗传距离的增长而减弱。连锁不平衡区别于连锁(linkage)(26),Linkage refers to the correlated inheritance of loci through the physical connection on a chromosome, whereas LD refers to the correlation between alleles in a population.5.1 连锁不平衡度量方法连锁不平衡的度量方法有很多,如相关系数r2、Lewontins D、人群归因危险度、Yules Q、Kaplan和Weirs比例差d等(27),大多应用于双等位基因的配对检验,其中使用最广泛的是D和r2。在简化模型中,假设两个位点的两个等位基因A、a和B、b,等位基因的频率分别是A、a、B、b,可形成4种单倍型,单倍型的频率表示为AB、aB、ab、Ab。The basic component of all LD statistics is the difference between the observed and expected haplotype frequencies:Dab=(AB-AB)。由于D的取值范围不理想,通常不直接使用这个公式进行度量,而对D先进行归一化后再使用。两个最常用的归一化的LD度量是r2和Lewontin的连锁不平衡系数(coefficient of linkage disequilibrium)D,r2(also described as 2)=Dab2/(AaBb),认为是两个等位基因的相关系数的平方;如果Dab0则D= Dab/min(AB,ab),如果Dab0则D=Dab/min(Ab,aB) (26)。标准化后,这两个度量的取值范围在0(重组)和1(完全连锁不平衡)之间。The measurement of LD is a large and complex topic and will not be reviewed in detail here; but see the work of Devlin and Risch (1995); Jorde (2000) and Hudson (2001). Most of the measures of LD that are in wide use quantify the degree of association between pairs of markers. In part, they differ according to the way in which they depend on the marginal allele frequencies. In the present article, we use one popular measure of LD between pairs of biallelic markers, commonly denoted by r2 (elsewhere,r2 is also denoted by 2) (28). D可以看成是一个和频率无关的度量,当在检测位点间观察不到任何重组事件的时候取得最大值1,即完全连锁不平衡(complete LD),在这种情况下由这两个位点构成的4种单倍型在所选的样本中至多只能出现3种。如果D1则说明这两个位点间发生过重组(新发生的突变也会引起D1,但对于SNP来说突变的概率较重组要小的多),这种情况下4种单倍型均可出现,但这时D值相对大小的意义就很模糊了(如D=0.3或D=0.7,二者的区别就很模糊),因此如果D的计算结果接近于1,则提示两位点间历史上发生重组的可能性很小,但如果D处于中间值则不可用该数值来比较两位点LD程度的差别。而且,在小样本中D值会显著增加,这对于有等位基因频率较低(两个等位基因中频率较低的一个频率98%缺点:1、只能单个位点分型,多位点时实验时间较长,样本消耗多2、试剂订购需要2-3个月3、约1/10的位点探针合成失败或大批量样本分型无法判读Snapshot技术:基于荧光标记单碱基延伸原理的分型技术,在一个含有测序酶、四种荧光标记ddNTP、紧邻多台位点5端的不同长度延伸引物和PCR产物模板的反应体系中,引物延伸一个碱基即终止,经ABI测序仪检测后,根据峰的移动位置确定该延伸产物对应的SNP位点,根据峰的颜色可得知掺入的碱基种类,从而确定该样本的基因型。通常用于1030个SNP位点分析。优点:1、分型准确:其准确度仅亚于直接测序,98%的成功率和准确率2、多位点同时检测:可以同时检测达12-20个位点3、不受位点多态特性限制:不管该位点是G/C, A/T, G/A, C/T,甚至部分插入/缺失多态,也无论该位点处于哪条染色体上,都可以放在一个体系中检测。4、不受样本量的限制,样本量从100-5000都可以完成,成本主要由位点决定,样本可多可少。5、可以检测出受污染的样本:如果一个样本的分型峰谱偏离正常的分布,它可以提示该样本可能受到污染或浓度过低,而其它分型方法则不能做到这一点6、国际认可度很高,文献很多缺点:1. 分型成本不低:根据样本量和位点数的多少,服务价格在6-15元2. 分型成本随着样本量的增加并不能快速降低,因此该方法在中等样本量的项目中有优势,在大样本项目中不具有优势Mass array(质谱分析技术):基质辅助激光解吸电离飞行时间质谱(MALDI-TOF MS) 技术。通过PCR扩增目标序列,加入SNP序列特异延伸引物,在SNP 位点上,延伸1个碱基;将制备的样品分析物与芯片基质共结晶,将该晶体放入质谱仪的真空管,强激光激发,使基质晶体升华,变为亚稳态离子,产生的离子多为单

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论