生物信息学软件使用_第1页
生物信息学软件使用_第2页
生物信息学软件使用_第3页
生物信息学软件使用_第4页
生物信息学软件使用_第5页
已阅读5页,还剩23页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、生物信息学软件的使用(以MC4R基因为例)第1章 从NCBI上查找DNA、mRNA、蛋白质序列1、 以猪的黑素皮质素受体4(MC4R, melanocortin-4 re-ceptor)基因为例,介绍如何从NCBI上查找DNA、mRNA、氨基酸序列。1. 首先查找MC4R的DNA序列。 在百度里输入NCBI,打开后得到的结果如下网页:在Search 栏输入 “MC4R pig”,在下拉菜单里选择Gene,然后点击Search,得到如下结果:点击第一个ID为397359的链接,得到如下的结果: 可以看到该基因位于猪的1号染色体上,在右下方有个“Go to nucleotide”即进入核酸序列,有

2、三种格式(用红圈标记的),经常用的是“FASTA”和“GenBank”,“FASTA”格式的比较简洁,不包含任何的数字,就全部是碱基,序列的对比和分析是就要用到这种格式;而“GenBank”格式就比较详细,可以查看到很多信息,比如碱基数、mRNA序列、内含子、外显子、CDS,以及氨基酸序列等等之类的。点击GenBank后得到如下结果:Sus scrofa breed mixed chromosome 1, Sscrofa10.2 DNALOCUS NC_010443 2265 bp DNA linear CON 29-SEP-2013DEFINITION Sus scrofa breed mi

3、xed chromosome 1, Sscrofa10.2.ACCESSION NC_010443 REGION: complement(178553488.178555752) GPC_000000583VERSION NC_010443.4 GI:347618793DBLINK BioProject: PRJNA28993 Assembly: GCF_000003025.5KEYWORDS RefSeq.SOURCE Sus scrofa (pig) ORGANISM Sus scrofa Eukaryota; Metazoa; Chordata; Craniata; Vertebrata

4、; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.COMMENT REFSEQ INFORMATION: The reference sequence is identical to CM000812.4. On Oct 11, 2011 this sequence version replaced gi:333795951. Assembly Name: Sscrofa10.2 The genomic sequence for this RefSeq record i

5、s from the genome assembly released by the Swine Genome Sequencing Consortium as Sscrofa10.2 in August 2011 (see http:/www.sanger.ac.uk/Projects/S_scrofa). Sscrofa10.2 is a mixed assembly of clones and contigs from the whole-genome shotgun project AEMK00000000.1. #Genome-Annotation-Data-START# Annot

6、ation Provider : NCBI Annotation Status : Full annotation Annotation Version : Sus scrofa Annotation Release 104 Annotation Pipeline : NCBI eukaryotic genome annotation pipeline Annotation Software Version : 5.1 Annotation Method : Best-placed RefSeq; Gnomon Features Annotated : Gene; mRNA; CDS; ncR

7、NA #Genome-Annotation-Data-END#FEATURES Location/Qualifiers source 1.2265 /organism="Sus scrofa" /mol_type="genomic DNA" /db_xref="taxon:9823" /chromosome="1" /breed="mixed" gene 1.2265 /gene="MC4R" /note="melanocortin 4 receptor; Deri

8、ved by automated computational analysis using gene prediction method: BestRefSeq." /db_xref="GeneID:397359" mRNA join(1.681,834.2265) /gene="MC4R" /product="melanocortin 4 receptor" /inference="similar to RNA sequence, mRNA (same species):RefSeq:NM_214173.1&qu

9、ot; /exception="annotated by transcript or proteomic data" /note="The RefSeq transcript has 2 indels compared to this genomic sequence; Derived by automated computational analysis using gene prediction method: BestRefSeq." /transcript_id="NM_214173.1" /db_xref="GI:

10、55741558" /db_xref="GeneID:397359" CDS join(534.681,834.1685) /gene="MC4R" /inference="similar to AA sequence (same species):RefSeq:NP_999338.1" /exception="annotated by transcript or proteomic data" /note="The RefSeq protein has 1 indel compared to

11、this genomic sequence; Derived by automated computational analysis using gene prediction method: BestRefSeq." /codon_start=1 /product="melanocortin receptor 4" /protein_id="NP_999338.1" /db_xref="GI:55741559" /db_xref="GeneID:397359" /translation="MN

12、STHHHGMHTSLHFWNRSTYGLHSNASEPLGKGYSEGGCYEQL FVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETI VITLLNSTDTDAQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALQYHNI MTVKRVGIIISCIWAVCTVSGVLFIIYSDSSAVIICLITVFFTMLALMASLYVHMFLM ARLHIKRIAVLPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNPY CVCFMSHFNLYLILIMCNSII

13、DPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY"ORIGIN 1 tcacagactc cccaggactt ggattggtca gaaagaagca gaggaggagc cactgtgcac 61 attttttttt ccccttcaca caccataaaa atcacagagg caactaacac tcacagcaaa 121 gcttcaggtt gggaactgat tctctctgcg aggcagctga tctgagcatg cgcacacaga 181 ttcattcttc tcccaatagc acagcagccg ctaggaaa

14、at tattttgaaa agacctgaat 241 gcattaagac taaagttaaa gtggaagtga gaacaaaata tcaaacagca gactcgacag 301 agaatgagcg tcttgaagcc taagatttca aagtgatgct aatcagagcc ctacctgaaa 361 gagactaaaa actccatttc aagcttcgga gcatgtgata tttattcaca acaggcattc 421 caatttcagc ctcataactt tcagacagat aaagacttgg agaaaatcgc tgaggc

15、tacc 481 tgacccagga gcttaaatca ggtcagaggg gatctcaacc cacctggcgc aggatgaact 541 caacccatca ccatggaatg catacttctc tccacttctg gaaccgcagc acctacggac 601 tgcacagcaa tgccagtgag ccccttggaa aagagctact ctgaaggagg atgctacgag 661 caactttttg tctctcctga ggtgtttgtg actctgggtg tcataagcct gt gap 100 bp Expand Ns 81

16、3 aaacgacg gcgtctctct gaggtgtttg 841 tgactctggg tgtcataagc ctgttggaga acattctggt gattgtggcc atagccaaga 901 acaagaatct gcattcaccc atgtactttt tcatctgtag cctggctgtg gctgatatgc 961 tggtgagcgt ttccaatggg tcagaaacca ttgtcatcac cctattaaac agcacggaca 1021 cggacgcaca gagtttcaca gtgaatattg ataatgtcat tgactcag

17、tg atctgtagct 1081 ccttactcgc ctcaatttgc agcctgcttt cgattgcagt ggacaggtat tttactatct 1141 tttatgctct ccagtaccat aacattatga cagttaagcg ggttggaatc atcatcagtt 1201 gtatctgggc agtctgcacg gtgtcgggtg ttttgttcat catttactca gatagcagtg 1261 ctgttattat ctgcctcata accgtgttct tcaccatgct ggctctcatg gcttctctct 13

18、21 atgtccacat gttcctcatg gccagactcc acattaagag gatcgccgtc ctcccaggca 1381 ctggcaccat ccgccaaggt gccaacatga agggggcaat taccctgacc atcttgattg 1441 gggtctttgt ggtctgctgg gcccccttct tcctccactt aatattctat atctcctgcc 1501 cccagaatcc atactgtgtg tgcttcatgt ctcactttaa tttgtatctc atcctgatca 1561 tgtgtaattc ca

19、tcatcgat cccctgattt atgcactccg gagccaagaa ctgaggaaaa 1621 ccttcaaaga gatcatctgt tgctatcccc tgggtggcct ctgtgatttg tctagcagat 1681 attaaatggg gacagaggag acttataaat gcaagcataa gagactttct ccttacacag 1741 tctggacaat atgcttcaac aacagcattt tcttgtaagg catcagttga gacattctat 1801 tgtataaatt taagttcgtg attctgc

20、tca gtctctgtgt atttttaagg tcttgctacc 1861 ttttggctgt aaaatgttta tctatactac aggttatagg cacaatggat ttataaaaaa 1921 gaaaaaagtc cttatgaaaa gttaattaat gtatcttgtc attcgaaagg atttgacaca 1981 ttgcttgttt tagtaaaatg gaaatcacag tttcattaaa tatatcctaa taaatggttg 2041 ctaatattac actatacaac gctgaagtgt agaggtttga t

21、tctagcatt gaggggagaa 2101 atactgaaac aagtgtttaa tcattaaaaa ataagctgaa atttcaacta atttaataaa 2161 acatgctcat tctccctgtg cagaaggaga aatgaagctt ctactgggag aaaaacagtt 2221 actaaaaaaa agtgggggga tattttgagt ttgaaaacta tgttt/2. 查找mRNA和氨基酸序列第一步和查DNA序列的一样,先打开NCBI,得到如下主页。2.1 点击主页面的“Nucleotide”,得到下面的网页:2.2 在“S

22、earch”栏里,输入 “MC4R pig”,然后点击“Search”,得到如下结果: 出现了很多的搜索结果,可以按照自己的需要点击不同的链接,比如我想要查找mRAN的完全编码序列,我就点击第一个“Sus scrofa MC4R mRNA, complete cds”,得到如下结果:Sus scrofa MC4R mRNA, complete cdsGenBank: DQ388767.1FASTA GraphicsGo to:LOCUS DQ388767 999 bp mRNA linear MAM 15-FEB-2006DEFINITION Sus scrofa MC4R mRNA, com

23、plete cds.ACCESSION DQ388767VERSION DQ388767.1 GI:87137928KEYWORDS .SOURCE Sus scrofa (pig) ORGANISM Sus scrofa Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.REFERENCE 1 (bases 1 to 999) AUTHORS Yang,X.Q., Yu

24、,H. and Liu,D. TITLE The comparative analysis on MC4R gene of wild boar, domestic pig and their crossbred JOURNAL UnpublishedREFERENCE 2 (bases 1 to 999) AUTHORS Yang,X.Q., Yu,H. and Liu,D. TITLE Direct Submission JOURNAL Submitted (05-FEB-2006) Northease Agricultural University, Animal Science &

25、; Technology, Xiangfang Block Gongbin Road Mucai Street, Harbin, Heilongjiang 150030, ChinaFEATURES Location/Qualifiers source 1.999 /organism="Sus scrofa" /mol_type="mRNA" /db_xref="taxon:9823" CDS 1.999 /codon_start=1 /product="MC4R" /protein_id="ABD281

26、76.1" /db_xref="GI:87137929" /translation="MNSTHHHGMHTSLHFWNRSTYGLHSNASEPLGKGYSEGGCYEQL FVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETI VITLLNSTDTDAQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALQYHNI MTVKRVGIIISCIWAVCTVSGVLFIIYSDSSAVIICLITVFFTMLALMASLYVHMFLM ARLHIKRIAVLPGTG

27、TIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNPY CVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY" variation 171 /replace="g" variation 175 /replace="c" variation 551 /replace="c" variation 758 /replace="c" variation 892 /replace="a"ORIGIN 1

28、atgaactcaa cccatcacca tggaatgcat acttctctcc acttctggaa ccgcagcacc 61 tacggactgc acagcaatgc cagtgagccc cttggaaaag gctactctga aggaggatgc 121 tacgagcaac tttttgtctc tcctgaggtg tttgtgactc tgggtgtcat aagcttgttg 181 gagaacattc tggtgattgt ggccatagcc aagaacaaga atctgcattc acccatgtac 241 tttttcatct gtagcctggc

29、 tgtggctgat atgctggtga gcgtttccaa tgggtcagaa 301 accattgtca tcaccctatt aaacagcacg gacacggacg cacagagttt cacagtgaat 361 attgataatg tcattgactc agtgatctgt agctccttac tcgcctcaat ttgcagcctg 421 ctttcgattg cagtggacag gtattttact atcttttatg ctctccagta ccataacatt 481 atgacagtta agcgggttgg aatcatcatc agttgtat

30、ct gggcagtctg cacggtgtcg 541 ggtgttttgt tcatcattta ctcagatagc agtgctgtta ttatctgcct cataaccgtg 601 ttcttcacca tgctggctct catggcttct ctctatgtcc acatgttcct catggccaga 661 ctccacatta agaggatcgc cgtcctccca ggcactggca ccatccgcca aggtgccaac 721 atgaaggggg caattaccct gaccatcttg attggggtct ttgtggtctg ctgggc

31、cccc 781 ttcttcctcc acttaatatt ctatatctcc tgcccccaga atccatactg tgtgtgcttc 841 atgtctcact ttaatttgta tctcatcctg atcatgtgta attccatcat cgatcccctg 901 atttatgcac tccggagcca agaactgagg aaaaccttca aagagatcat ctgttgctat 961 cccctgggtg gcctctgtga tttgtctagc agatattaa/由此,就可以得到我们想要的mRNA序列和氨基酸序列。第二章 PCR引物的设计

32、1. 用NCBI设计PCR引物打开NCBI的首页,然后点击BLAST,得到如下结果:点击“Specialized BLAST”中的“Primer-BLAST”,得到如下界面:如果我要设计扩增mRNA的引物,那么就把mRNA序列输入进去,其他参数可以根据自己需要进行设定,一般默认就好,然后点击“Get Primers”,得到如下界面:第3章 利用生物软件对RNA的二级结构进行预测1. 利用生物软件(http:/www.genebee.msu.su/services/rna2_reduced.html)对mRNA的二级结构进行预测,打开网址后得到如下网页: 输入MC4R的mRNA序列后,在“Inp

33、ut Type”一栏选择“Sequence”;然后点击“提交”,得到如下结果:第4章 DNA序列、蛋白质序列的BLAST对比分析1.我就以MC4R的蛋白序列为例,首先打开NCBI,得到如下界面:点击右边的“BLAST”,得到如下界面:一般使用“nucleotide blast”和“protein blast”,前者是核酸的对比,后者是蛋白质的对比,我查找的是蛋白质对比,所以点击“protein blast”,得到如下结果:输入猪MC4R基因编码的蛋白质序列,然后点击BLAST,得到如下界面:第五章 预测分析蛋白质的一级结构、二级结构以及三级结构一,以猪的以猪的黑素皮质素受体4(MC4R,mel

34、anocortin-4 re-ceptor)基因的氨基酸序列为例,通过不同的在线生物分析软件对MC4R编码的蛋白质的一级结构进行分析。1. 通过ExPASy中的protparam(/tools/protparam.html)对蛋白质的分子量、等电点进行预测,输入网址后结果如下: 输入氨基酸序列后,点击“Compare parameters”,得到如下结果: ProtParamUser-provided sequence: 10 20 30 40 50 60MNSTHHHGMH TSLHFWNRST YGLHSNASEP LGKGYSEGGC YEQLFV

35、SPEV FVTLGVISLL 70 80 90 100 110 120ENILVIVAIA KNKNLHSPMY FFICSLAVAD MLVSVSNGSE TIVITLLNST DTDAQSFTVN 130 140 150 160 170 180IDNVIDSVIC SSLLASICSL LSIAVDRYFT IFYALQYHNI MTVKRVGIII SCIWAVCTVS 190 200 210 220 230 240GVLFIIYSDS SAVIICLITV FFTMLALMAS LYVHMFLMAR LHIKRIAVLP GTGTIRQGAN 250 260 270 280 290

36、300MKGAITLTIL IGVFVVCWAP FFLHLIFYIS CPQNPYCVCF MSHFNLYLIL IMCNSIIDPL 310 320 330IYALRSQELR KTFKEIICCY PLGGLCDLSS RYNumber of amino acids: 332 (氨基酸数目)Molecular weight: 36946.7 (分子量)Theoretical pI: 7.13 (等电点)窗体顶端Amino acid composition: (氨基酸组成)Ala (A) 19 5.7% Arg (R) 9 2.7% Asn (N) 15 4.5% Asp (D) 9 2.

37、7% Cys (C) 15 4.5% Gln (Q) 6 1.8% Glu (E) 8 2.4% Gly (G) 17 5.1% His (H) 12 3.6% Ile (I) 38 11.4% Leu (L) 39 11.7% Lys (K) 8 2.4% Met (M) 12 3.6% Phe (F) 19 5.7% Pro (P) 9 2.7% Ser (S) 33 9.9% Thr (T) 19 5.7% Trp (W) 3 0.9% Tyr (Y) 15 4.5% Val (V) 27 8.1% Pyl (O) 0 0.0% Sec (U) 0 0.0% (B) 0 0.0% (Z)

38、 0 0.0% (X) 0 0.0%Total number of negatively charged residues (Asp + Glu): 17Total number of positively charged residues (Arg + Lys): 17Atomic composition: (原子组成)Carbon C 1692Hydrogen H 2645Nitrogen N 415Oxygen O 455Sulfur S 27Formula: C1692H2645N415O455S27 Total number of atoms: 5234 (总原子数)Extincti

39、on coefficients: (消光系数)Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.Ext. coefficient 39725Abs 0.1% (=1 g/l) 1.075, assuming all pairs of Cys residues form cystinesExt. coefficient 38850Abs 0.1% (=1 g/l) 1.052, assuming all Cys residues are reducedEstimated half-life:

40、 (半衰期)The N-terminal of the sequence considered is M (Met).The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro). >20 hours (yeast, in vivo). >10 hours (Escherichia coli, in vivo).Instability index: (不稳定系数)The instability index (II) is computed to be 46.15This classifies the

41、 protein as unstable.Aliphatic index: 119.76 (脂肪系数)Grand average of hydropathicity (GRAVY): 0.765 (总平均亲水性) 由以上结果可以知道猪MC4R基因所编码的蛋白质的分子量为36946.7,等电点为7.13,脂肪系数为119.76,总平均亲水性为0.765 ,以及其他的指标。2.利用ProtScale(/cgi-bin/protscale.pl)对氨基酸序列做疏水性分析;输入网址后结果如下: 输入MC4R的氨基酸序列,点击“Submit”,得到如下的结果:Pro

42、tScaleUser-provided sequence: 10 20 30 40 50 60MNSTHHHGMH TSLHFWNRST YGLHSNASEP LGKGYSEGGC YEQLFVSPEV FVTLGVISLL 70 80 90 100 110 120ENILVIVAIA KNKNLHSPMY FFICSLAVAD MLVSVSNGSE TIVITLLNST DTDAQSFTVN 130 140 150 160 170 180IDNVIDSVIC SSLLASICSL LSIAVDRYFT IFYALQYHNI MTVKRVGIII SCIWAVCTVS 190 200 210 220 230 240GVLFIIYSDS SAVIICLITV FFTMLALMAS LYVHMFLMAR LHIKRIAVLP GTGTIRQGAN 250 260 270 280 290 300MKGAITLTIL IGVFVVCWAP FFLHLIFYIS CPQNPYCV

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论