




免费预览已结束,剩余16页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
生物信息学作业细胞色素C的性质及进化分析专业:生化与分子生物学 学号:0313302148 姓名:谢家赟一序列的获取进入NCBI HomePage(),利用entrez的蛋白质数据库搜索细胞色素C(Cytochrome C),便得到一系列相关信息,选择人类细胞色素C(Accession No. 1J3S_A; Organism Homo sapiens (human)),并利用其序列进行蛋白的相似性比较Blastp,再在其中选取有代表性的物种的相应蛋白序列,共计13个:Accession No.OrganismProtein sequence(Cytochrome C)1J3S_AHomo sapiens (human)gdvekgkkif imkcsqchtv ekggkhktgp nlhglfgrkt gqapgysyta anknkgiiwg edtlmeylen pkkyipgtkm ifvgikkkee radliaylkk atneQ7YR71Trachypithecus cristatus (silvered leaf monkey)mgdvekgkki limkcsqcht vekggkhktg pnhhglfgrk tgqapgysyt aanknkgitw gedtlmeyle npkkyipgtk mifvgikkke eradliaylk katneP00003Ateles sp. (spider monkey)gdvfkgkrif imkcsqchtv ekggkhktgp nlhglfgrkt gqasgftyte anknkgiiwg edtlmeylen pkkyipgtkm ifvgikkkee radliaylkk atneP00008Oryctolagus cuniculus (rabbit)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqavgfsytd anknkgitwg edtlmeylen pkkyipgtkm ifagikkkde radliaylkk atneNP_031834Mus musculus (house mouse)mgdvekgkki fvqkcaqcht vekggkhktg pnlhglfgrk tgqaagfsyt danknkgitw gedtlmeyle npkkyipgtk mifagikkkg eradliaylk katneC04604Cavia porcellus (domestic guinea pig)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqaagfsytd anknkgitwg edtlmeylen pkkyipgtkm ifagikkkge radliaylkk atneP00007Hippopotamus amphibius (hippopotamus)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqspgfsytd anknkgitwg eetlmeylen pkkyipgtkm ifagikkkge radliaylkq atneXP_212981Rattus norvegicus (Norway rat)mgdvekgkki fvqkcaqcht vekggkhktg pnlhglfgrk tgqaagfsyt danknkgitw gedtlmeyle npkkyipgtk mifagikkkg eradliaylk kttneP00006Bos taurus (cow)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqapgfsytd anknkgitwg eetlmeylen pkkyipgtkm ifagikkkge redliaylkk atne721949AChiroptera (bats)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqapgfsytd anknkgitwg eatlmeylen pkkyipgtkm ifagikksae radliaylkk atkeP00010Eschrichtius robustus (grey whale)gdvekgkkif vqkcaqchtv ekggkhktgp nlhglfgrkt gqavgfsytd anknkgitwg eetlmeylen pkkyipgtkm ifagikkkge radliaylkk atneP00016Gallus gallus (chicken)mgdiekgkki fvqkcsqcht vekggkhktg pnlhglfgrk tgqaegfsyt danknkgitw gedtlmeyle npkkyipgtk mifagikkks ervdliaylk datskP00021Columba livia (domestic pigeon)gdiekgkkif vqkcsqchtv ekggkhktgp nlhglfgrkt gqaegfsytd anknkgitwg edtlmeylen pkkyipgtkm ifagikkkae radliaylkq atak二.蛋白质结构的分析与预测本文以Acc No. 1J3S_A(Organism:Homo sapiens (human))为例,对其进行蛋白质结构的分析与预测。1.一级结构的分析进入/tools/,进行在线分析。本文所选用的用于一级结构分析的工具是ProtParam,REP,SAPS,Coils和ProtScale。结果如下:(1)序列的ProtParam分析, 以计算其物化参数。操作:进入ProtParam操作界面,输入目的序列,然后进行计算。User-provided sequence: 1 11 21 31 41 51 | | | | | | 1 GDVEKGKKIF IMKCSQCHTV EKGGKHKTGP NLHGLFGRKT GQAPGYSYTA ANKNKGIIWG 61 EDTLMEYLEN PKKYIPGTKM IFVGIKKKEE RADLIAYLKK ATNEReferences and documentation are available. Number of amino acids: 104Molecular weight: 11617.5Theoretical pI: 9.59Amino acid composition:Ala (A) 6 5.8%Arg (R) 2 1.9%Asn (N) 5 4.8%Asp (D) 3 2.9%Cys (C) 2 1.9%Gln (Q) 2 1.9%Glu (E) 8 7.7%Gly (G) 13 12.5%His (H) 3 2.9%Ile (I) 8 7.7%Leu (L) 6 5.8%Lys (K) 18 17.3%Met (M) 3 2.9%Phe (F) 3 2.9%Pro (P) 4 3.8%Ser (S) 2 1.9%Thr (T) 7 6.7%Trp (W) 1 1.0%Tyr (Y) 5 4.8%Val (V) 3 2.9%Asx (B) 0 0.0%Glx (Z) 0 0.0%Xaa (X) 0 0.0%Total number of negatively charged residues (Asp + Glu): 11Total number of positively charged residues (Arg + Lys): 20Atomic composition:Carbon C 521Hydrogen H 836Nitrogen N 142Oxygen O 148Sulfur S 5Formula: C521H836N142O148S5Total number of atoms: 1652Extinction coefficients:Conditions: 6.0 M guanidium hydrochloride 0.02 M phosphate buffer pH 6.5Extinction coefficients are in units of M-1 cm-1 .The first table lists values computed assuming ALL Cys residues appear as half cystines, whereas the second table assumes that NONE do. 276 278 279 280 282 nm nm nm nm nmExt. coefficient 12795 12727 12505 12210 11720Abs 0.1% (=1 g/l) 1.101 1.096 1.076 1.051 1.009 276 278 279 280 282 nm nm nm nm nmExt. coefficient 12650 12600 12385 12090 11600Abs 0.1% (=1 g/l) 1.089 1.085 1.066 1.041 0.998Estimated half-life:The N-terminal of the sequence considered is G (Gly).The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro). 20 hours (yeast, in vivo). 10 hours (Escherichia coli, in vivo).Instability index:The instability index (II) is computed to be 11.39This classifies the protein as stable.Aliphatic index: 66.63Grand average of hydropathicity (GRAVY): -0.731结果解释: 蛋白质是由104个氨基酸组成的,分子量为11617.5,等电点为9.59,估计其半衰期在哺乳动物体内为30小时。是一个稳定蛋白。(2) 序列的REP分析,以计算其重复序列。操作:进入REP操作界面,输入目的序列,然后进行搜索。No repeat detected(3) 序列的SAPS分析,以进行统计学分析。操作:进入SAPS操作界面,输入目的序列,然后进行搜索。Protein 1 (File: wwwtmp/.SAPS.14378.1117.seq)SWISS-PROT ANNOTATION:ID unknownDE unknown, 104 bases, C485894F checksum.number of residues: 104; molecular weight: 11.6 kdal 1 GDVEKGKKIF IMKCSQCHTV EKGGKHKTGP NLHGLFGRKT GQAPGYSYTA ANKNKGIIWG 61 EDTLMEYLEN PKKYIPGTKM IFVGIKKKEE RADLIAYLKK ATNE-COMPOSITIONAL ANALYSIS (extremes relative to: swp23s.q)The composition of the input sequence is evaluated relative to the residue usage quantile table specified with the -s species flag. Low usage in the 1% quantile is indicated by the label - (e.g., Y- means that the input sequence uses tyrosine as little as the 1% least tyrosine containing proteins in the reference set); low usage in the 5% quantile is indicated by the label - (e.g., L-); high usage above the 95% quantile point is indicated by the label + (e.g., A+); and high usage above the 99% quantile point is indicated by the label + (e.g., LIVFM+). The usage is evaluated for all 20 amino acids, positive (KR) and negative (ED) charge, total charge (KRED), net charge (KR-ED), major hydrophobics (LVIFM), and the groupings ST, AGP (encoded by CCN, GCN, and GGN codons), and FIKMNY (encoded by AAN, AUN, UAN, and UUN codons).A : 6( 5.8%); C : 2( 1.9%); D : 3( 2.9%); E : 8( 7.7%); F : 3( 2.9%)G : 13(12.5%); H : 3( 2.9%); I : 8( 7.7%); K : 18(17.3%); L : 6( 5.8%)M : 3( 2.9%); N : 5( 4.8%); P : 4( 3.8%); Q : 2( 1.9%); R : 2( 1.9%)S : 2( 1.9%); T : 7( 6.7%); V : 3( 2.9%); W : 1( 1.0%); Y : 5( 4.8%)KR : 20 ( 19.2%); ED : 11 ( 10.6%); AGP : 23 ( 22.1%);KRED : 31 ( 29.8%); KR-ED : 9 ( 8.7%); FIKMNY : 42 ( 40.4%);LVIFM : 23 ( 22.1%); ST : 9 ( 8.7%).-CHARGE DISTRIBUTIONAL ANALYSIS The distribution of charges in the protein sequence is evaluated in terms of clusters, high scoring segments, and runs and periodic patterns. Clusters indicate regions of typically 30 to 60 residues exhibiting a relatively high charge concentration. For high scoring charge segments, positive scores are assigned to charge residues of the appropriate type andnegative scores to all other residues. A significant cumulative positive score again indicates a region of high charge concentration. The cluster method and the scoring method will generally pick out the same segments (with the scoring method often delimiting the segment to a narrower range), conferring robustness to the results. Short segments of high charge concentration are displayed as runs (with errors). Periodic patterns focus on those with charges every second or third position, with possible relevance to amphipathic secondary structures; other periodic patterns are displayed in the general periodicity analysis section of the output. 1 0-0-+0+00 00+0000000 -+00+0+000 0000000+0 0000000000 00+0+00000 61 -000-00-0 0+00000+0 00000+- +0-00000+ 000-A. CHARGE CLUSTERS. Positive, negative, and mixed charge clusters are distinguished. In each case, cmin indicates the minimum number of charges required for a significant charge cluster corresponding to the given window size; e.g., cmin =9/30 or 12/45 or 15/60 means that significance requires at least 9 charges in a segment of 30 (or fewer) residues, or 12 charges in a segment of length 45, or 15 charges in a segment of length 60. In the case of positive and negative charge clusters, these counts refer to net charge, i.e.,charges of the opposite sign within the window are counted as -1. The sizes of the clusters are optimized for display to indicate the segment of highest charge concentration, but a minimum size of 20 residues is required. A mixed charge cluster that begins and ends within 15 residues of the endpoints of a pure charge cluster is not displayed (since its significance rests mostly on the charged residues comprising the displayed pure charge cluster), unless the -v (verbose output) flag is set, in which case both the pure and the mixed charge cluster are displayed. On the other hand, pure charge clusters that are embedded in mixed charge clusters are displayed separately (indicated by a * preceding the specifica- tion of location). For each cluster are given its location in the sequence (From, to),the quartile of the location (1st, 2nd, 3rd, or 4th quarter of the sequence), length, count, and t-value (standard deviations above the mean; to accommodate the multiple tests performed, the t-value significance hreshold is set to 4.0 for sequences up to 750 residues, to 4.5 for sequences of length 750-1500 residues, and to 5.0 for longer sequences); also indicated are residues comprising at least 10% of the cluster.Positive charge clusters (cmin = 14/30 or 19/45 or 23/60): noneNegative charge clusters (cmin = 9/30 or 13/45 or 15/60): noneMixed charge clusters (cmin = 18/30 or 25/45 or 32/60): noneB. HIGH SCORING (UN)CHARGED SEGMENTS. For each scoring scheme (scores assigned to residues as displayed), SAPS displays segments of the sequence with aggregate score exceeding the particular threshold values M_0.01 (1% significance level, segments labeled with *), M_0.05 (5% significance level, segments labeled *), or otherwise as indicated. A minimal segment length is set as shown. The expected score/letter should be sufficiently large negative, and the average information per letter should be sufficiently large positive in order for the scoring statistics to apply properly (the program prints out when the conditions are not met and skips evaluations)._High scoring positive charge segments:score= 2.00 frequency= 0.192 ( KR )score= 0.00 frequency= 0.000 ( BZX )score= -1.00 frequency= 0.702 ( LAGSVTIPNFQYHMCW )score= -2.00 frequency= 0.106 ( ED ) Expected score/letter: -0.529; Average information/letter: 0.453 Minimal length of displayed segments set to: 20M_0.01= 14.34 (cv= 8.90, lambda= 0.52205, k= 0.17204, x= 5.44; 90% confidence interval for segment length: 24 +- 25)M_0.05= 11.21 (x= 2.32)# of segments (=20 residues) exceeding M_0.05: none_High scoring negative charge segments:score= 2.00 frequency= 0.106 ( ED )score= 0.00 frequency= 0.000 ( BZX )score= -1.00 frequency= 0.702 ( LAGSVTIPNFQYHMCW )score= -2.00 frequency= 0.192 ( KR ) Expected score/letter: -0.875; Average information/letter: 1.447 Minimal length of displayed segments set to: 20M_0.01= 8.69 (cv= 4.92, lambda= 0.94312, k= 0.35110, x= 3.77; 90% confidence interval for segment length: 8 +- 8)M_0.05= 6.96 (x= 2.04)# of segments (=20 residues) exceeding M_0.05: none_High scoring mixed charge segments:score= 1.00 frequency= 0.298 ( KEDR )score= 0.00 frequency= 0.000 ( BZX )score= -1.00 frequency= 0.702 ( LAGSVTIPNFQYHMCW ) Expected score/letter: -0.404; Average information/letter: 0.499 Minimal length of displayed segments set to: 20M_0.01= 9.09 (cv= 5.42, lambda= 0.85647, k= 0.23235, x= 3.67; 90% confidence interval for segment length: 23 +- 21)M_0.05= 7.19 (x= 1.76)# of segments (=20 residues) exceeding M_0.05: none_High scoring uncharged segments:score= 1.00 frequency= 0.702 ( LAGSVTIPNFQYHMCW )score= 0.00 frequency= 0.000 ( BZX )score= -8.00 frequency= 0.298 ( KEDR ) Expected score/letter: -1.683; Average information/letter: 0.391 Minimal length of displayed segments set to: 20M_0.01= 23.36 (cv= 13.95, lambda= 0.33293, k= 0.23023, x= 9.41; 90% confidence interval for segment length: 29 +- 17)M_0.05= 18.46 (x= 4.51)# of segments (=20 residues) exceeding M_0.05: noneC. CHARGE RUNS AND PATTERNS. The table below shows the charge runs and patterns searched for (* stands for + or -) and the required minimum number of matches to the pattern allowing for at most 0 (lmin0), 1 (lmin1), or 2 (lmin2) mismatches or insertions/deletions (1% significance level). Occurrences are arranged in the order in which they appear in the sequence. For each run or pattern are displayed its length (number of matches) and a triplet giving the number of mismatches, insertions and deletions. 0-runs are further characterized by their composition (residues comprising more than 10% of the run). Run count statistics are compiled for runs of lengths at least 2/3 of the minimal significant length (lmin0); given are the number and locations of such runs.pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H.)|lmin0 5 | 4 | 7 | 23 | 10 | 8 | 12 | 11 | 9 | 13 | 6 | 8 | lmin1 7 | 5 | 9 | 28 | 12 | 10 | 15 | 13 | 11 | 16 | 8 | 9 | lmin2 8 | 6 | 10 | 30 | 13 | 11 | 16 | 15 | 12 | 18 | 9 | 11 | (Significance level: 0.010000; Minimal displayed length: 6)There are no charge runs or patterns exceeding the given minimal lengths.Run count statistics: + runs = 4: 0 - runs = 3: 0 * runs = 5: 1, at 86; 0 runs = 15: 0-DISTRIBUTION OF OTHER AMINO ACID TYPES Routinely, SAPS indicates high scoring hydrophobic and transmembrane seg- ments. The display is as desribed above for high scoring charge segments. The scores for the hydrophobic segments correspond to a digitized hydropathy scale. The transmembrane scores were derived from target frequencies in putative transmembrane proteins (see the paper referred to above; note, however, that the scores used in the program have been rederived and differ from the ones given in the paper). With the -a command line flag, the user can invoke a similar analysis for other residue types. In view of the special role of cysteines for protein structure, the spacings of the cysteine residues in the sequence are displayed separately, with par- ticular emphasis on close pairs of cysteines and distances between such pairs.1. HIGH SCORING SEGMENTS._High scoring hydrophobic segments: 2.00 (LVIFM) 1.00 (AGYCW) 0.00 (BZX) -2.00 (PH) -4.00 (STNQ) -8.00 (KEDR) Expected score/letter: -2.433; Average information/letter: 0.787 Minimal length of displayed segments set to: 15M_0.01= 18.51 (cv= 10.55, lambda= 0.44041, k= 0.33582, x= 7.97; 90% confidence interval for segment length: 15 +- 9)M_0.05= 14.81 (x= 4.27)# of segments (=15 residues) exceeding M_0.05: none_High scoring transmembrane segments: 5.00 (LVIF) 2.00 (AGM) 0.00 (BZX) -1.00 (YCW) -2.00 (ST) -6.00 (P) -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED) Expected score/letter: -4.875; Average information/letter: 0.781 Minimal length of displayed segments set to: 15M_0.01= 39.51 (cv= 23.10, lambda= 0.20104, k= 0.27237, x= 16.41; 90% confidence interval for segment length: 15 +- 10)M_0.05= 31.41 (x= 8.30); M_0.30= 21.76 (x= -1.34)# of segments (=15 residues) exceeding M_0.30: none2. SPACINGS OF C.H2N-13CSQC at 14 -87-COOH-REPETITIVE STRUCTURES. Repeats are indicated for two alphabets: the 20-letter amino acid alphabet, and a reduced 11-letter alphabet in which the major hydrophobics LVIF, the charged residues KR and ED, the small residues AG, the hydroxyl group residues ST, the amid group residues NQ, and the aromatics YW are treated as combined letters. For each alphabet, three classes of repeats are distinguished: separated repeats, simple tandem repeats, and periodic repeats. The separated repeats are largely non-overlapping. They are displayed in groups of matching blocks (exceeding a given core block length of contiguous exact matches) and intervening spacer distances (which may be negative, signifying a partial overlap). The core block length in case of the amino acid alphabet is set to 4 for sequences up to 500 residues, to 5 for sequences between 500 and 2000 residues, and to 6 f
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年社区工作者考试复习题库必背题题库及答案解析
- 2025河南安阳市龙安区人社局招聘公益性岗位7人模拟试卷及答案详解(典优)
- 2025年甘肃省中材科技(酒泉)风电叶片有限公司招聘100人模拟试卷及答案详解(名校卷)
- 2025年上半年延边州社会考生普通话水平测试考前自测高频考点模拟试题及答案详解(网校专用)
- 2025年郑州空中丝路文化传媒有限公司社会公开招聘6人考前自测高频考点模拟试题及答案详解(考点梳理)
- 2025广西大学招聘专职辅导员25人模拟试卷附答案详解(考试直接用)
- 2025内蒙古鄂尔多斯市杭锦旗教育领域校园专场招聘专业技术人员14人考前自测高频考点模拟试题及完整答案详解一套
- 2025国家市场监督管理总局国家标准技术审评中心招聘高校应届毕业生(事业编)2人模拟试卷及答案详解(各地真题)
- 2025贵州省农业科学院引进急需紧缺人才3人考前自测高频考点模拟试题及答案详解(网校专用)
- 2025年福建省龙岩市河田镇人民政府招聘1人模拟试卷附答案详解
- 学堂在线 战场侦察监视技术与装备 章节测试答案
- 智慧产业园区AI大模型数字化平台建设方案
- 全球变暖与地缘冲突-洞察及研究
- 土壤隐患排查培训
- 工贸行业重大事故隐患判定标准安全试题及答案
- 垃圾分类可回收管理制度
- 新兴科技宪法回应机制-洞察及研究
- 环卫车辆司机管理制度
- 社工职工考试题及答案
- 三人酒店合伙合同范本
- 装修装饰-设计方案投标文件(技术方案)
评论
0/150
提交评论