版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、会计学1淀粉样变性病讲解淀粉样变性病讲解Genome ResourcesGenome ResourcesA single query interface to SequencesSequences- RefSeqs- RefSeqs- GenBank- GenBank- Homologene- HomologeneMaps MapViewerMaps MapViewerEntrez linksEntrez linksLocusLink will be replaced by Entrez Gene on MARCH 1, 2005. Check Gene FAQ for current inf
2、ormation.LocusLinkA single query interface to SequencesSequences - RefSeqs - RefSeqs - GenBank - GenBank - Homologene - HomologeneMaps MapViewerMaps MapViewerEntrez linksEntrez linksEntrez Gene More organisms - all RefSeq genomes Entrez integrationGsnsym淀粉样变性病nadh24726 recordsHomo sapiens血色沉着病NM_NM_
3、NP_NP_links to sequencehfegene name AND humanorgn 52血色沉着病染色体定位基因定位序列定位Genome ResourcesGene-oriented clusters of expressed sequences Automatic clustering using MegaBlast Each cluster represents a unique gene Informed by genome hits Information on tissue types and map locations Useful for gene discove
4、ry and selection of mapping reagentsquery5 EST hits3 EST hitsUnigeneweb pageftp siteGenome ResourcesAutomated detection of homologs among the annotated genes of completely sequenced eukaryotic genomes. Orthologs 和 Paralogs 是同源序列的两种类型。 Orthologs(垂直同源基因)是指来自于不同物种的由垂直家系(物种形成)进化而来的蛋白,并且典型的保留与原始蛋白有相同的功能。
5、 Paralogs(平行同源基因)是那些在一定物种中的来源于基因复制的蛋白,可能会进化出新的与原来有关的功能。请参考文献获得更多的信息。gene duplicationearly globin geneA-chain gene B-chain genefrog A chick A mouse Amouse B chick B frog Bparalogsorthologs orthologs Homologene Build 37.2Species Number of genes input grouped groupsrag112recombination activating gene R
6、AG1Amniota Genome Resourcesadar腺甙脱氨酶43 UTR5 UTR-Sequence mapsSequence maps-Ab initioAssemblyRepeatsBES_CloneCloneNCI_CloneContigComponentCpG islanddbSNP haplotypeFosmidGenBank_DNAGenePhenotypeSAGE_TagSTSTCAG_RNATranscript (RNA)Hs_UniGeneHs_EST-Cytogenetic mapsCytogenetic maps-IdeogramFISH CloneGene_
7、CytogeneticMitelman BreakpointMorbid/Disease-Genetic Maps-deCODEGenethonMarshfield-RH maps-GeneMap99-G3GeneMap99-GB4NCBI RHStandford-G3TNGWhitehead-RHWhitehead-YACMm_UniGeneMm_ESTRn_UniGeneRn_ESTSsc_UniGeneSsc_ESTBt_UniGeneBt_ESTGga_UniGeneGga_ESTVariationMaps & Options= SNPUniGeneComponentRepea
8、tsGeneGenePhenotypeVariationMaps & OptionsGenome ResourcesWeb AccessBLASTVASTEntrezTextSequenceStructure Why use sequence similarity? BLAST algorithm BLAST statistics BLAST output Examples: NCBIs toolSeq 1Seq 2Seq 1Seq 2Global alignmentLocal alignmentSeq1: WHEREISWALTERNOW (16aa)Seq2: HEWASHEREB
9、UTNOWISHERE (21aa)GlobalSeq1:1 W-HEREISWALTERNOW 16 W HERE Seq2:1 HEWASHEREBUTNOWISHERE 21LocalSeq1: 1 W-HERE 5 Seq1: 1 W-HERE 5 W HERE W HERESeq2: 3 WASHERE 9 Seq2: 15 WISHERE 21QueryQueryDatabaseDatabaseProgramProgramNPucleotideroteinNNNNPPblastxtblastntblastxPPPPPPPPPPPPPPPPPPPPPPPPParticularly u
10、seful for nucleotide sequences withoutprotein annotations, such as ESTs or genomic DNAGTACTGGACATGGACCCTACAGGAAQuery:GTACTGGACAT TACTGGACATG ACTGGACATGG CTGGACATGGA TGGACATGGAC GGACATGGACC GACATGGACCC ACATGGACCCTMake a lookuptable of words11-mer. . .828megablast711blastnminimumdefaultWORD SIZEGTQITV
11、EDLFYNIATRRKALKNQuery: Neighborhood WordsLTV, MTV, ISV, LSV, etc.GTQ TQI QIT ITV TVE VED EDL DLF .Make a lookuptable of wordsWord size = 3 (default)Word size can only be 2 or 3 -f 11 = blastp default Nucleotide BLAST requires one exact match Protein BLAST requires two neighboring matches within 40 a
12、aGTQITVEDLFYNI SEI YYNATCGCCATGCTTAATTGGGCTT CATGCTTAATT neighborhood wordsone exact matchtwo matches -A 40 = blastp default YLS HFLSbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333 Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47Gapped extension with trace backGapped extension
13、 with trace backQuery 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV 50 +E YA YL K F+YLSL +SP+ +DVNVHP+K VHFL+ I + +Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI 337 Final HSPFinal HSP +E YA YL K F+ L +SP+ +DVNVHP+K V + I High-scoring pair (HSP)High-scoring pair (HSP)HFL 18HFV
14、 15 HFS 14HWL 13NFL 13DFL 12HWV 10etc YLS 15YLT 12 YVS 12YIT 10etc Neighborhood Neighborhood wordswordsNeighborhood Neighborhood score thresholdscore thresholdT (-f) =11T (-f) =11Query: IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEVexample query wordsexample query words A G C TA +1 3 3 -3G 3 +1
15、3 -3C 3 3 +1 -3T 3 3 3 +1Identity matrixCAGGTAGCAAGCTTGCATGTCA| | | raw score = 19-9 = 10CACGTAGCAAGCTTG-GTGTCA -r 1 -q -3 Position Independent MatricesPAM Matrices (Percent Accepted Mutation) Derived from observation; small dataset of alignments Implicit model of evolution All calculated from PAM1
16、PAM250 widely usedBLOSUM Matrices (BLOck SUbstitution Matrices) Derived from observation; large dataset of highly conserved blocks Each matrix derived separately from blocks with a defined percent identity cutoff BLOSUM62 - default matrix for BLASTPosition Specific Score Matrices (PSSMs)PSI- and RPS
17、-BLASTA 4R -1 5 N -2 0 6D -2 -2 1 6C 0 -3 -3 -3 9Q -1 1 0 0 -3 5E -1 0 0 2 -4 2 5G 0 -2 0 -1 -3 -2 -2 6H -2 0 1 -1 -3 0 0 -2 8I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6P -1
18、 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4X 0 -1
19、-1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 A R N D C Q E G H I L K M F P S T W Y V XDFNegative for less likely substitutionsDYFPositive for more likely substitutionsDAF-1Serine/Threonine protein kinases catalytic loop174PSSM scores54 A R N D C Q E G H I L K M F P S T W Y V 435 K -1 0 0 -
20、1 -2 3 0 3 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1 438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 P -2 -2 -2 -2 -3 -2 -2 -2 -2
21、-1 -2 -1 0 -3 7 -1 -2 -3 -1 -1 441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0 442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4 444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 R -4 8 -3 -4 0 -1 -2
22、-3 -2 -5 -4 0 -3 -2 -4 -3 -3 0 -4 -5 446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5 447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1 448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4 450 K 0 3 0 1 -
23、5 0 0 -4 -1 -4 -3 4 -3 -2 2 1 -1 -5 -4 -4 451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5 452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 K -2
24、1 1 4 -5 0 -1 -2 1 -4 -2 4 -3 -2 -3 0 -1 -5 -2 -3 456 N 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4 457 D -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5 458 L -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3catalytic loop ./blastpgp -i NP_499868.2 -d nr -j 3 -Q NP_499868.pssm High
25、scores of local alignments between two random sequencesfollow the Extreme Value DistributionScore (S)Alignments(applies to ungapped alignments)E = Kmne-S or E = mn2-SK = scale for search space = scale for scoring system S = bitscore = (S - lnK)/ln2Expect ValueExpect ValueE = number of database hits
26、you expect to find by chance, Syour scoreexpected number of random hitsMore info: /BLAST/tutorial/Altschul-1.html Example Entrez Queriesnucleotide allFilter NOT mammaliaOrganismgreen plantsOrganismbiomol mrnaPropertiesgbdiv estProperties AND ratorganismOther Advancede 10000 expec
27、t value-v 2000 descriptions-b 2000 alignmentsMatrix SelectionPAM30 - most stringentBLOSUM45 - least stringentExample Entrez Queriesproteins allFilter NOT mammaliaOrganismgreen plantsOrganismsrcdb refseqPropertiesOther Advancede 10000 expect value-v 2000 descriptions-b 2000 alignmentsLimit by taxonMu
28、s musculusOrganismMammaliaOrganismViridiplantaeOrganism sp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN (P67) Length = 414 Score = 40.2 bits (92), Expect = 0.013 Identities = 35/131 (26%), Positives = 56/131 (42%), Gaps = 4/131 (3%)Query: 362 STTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQP
29、LSKPLS-SQPQAIVTEDKTD 418 S+S SSS+S SS + + +S + + S S S+ + E K Sbjct: 29 SSSSSESSSSSSSSSESESESESESESSSSSSSSDSESSSSSSSDSESEAETKKEESKDS 88FilteredUnfiltered Megablast Discontiguous Megablast PSI-BLAST PHI-BLASTTrade-off: sensitivity vs speed23blastp828megablast711blastnminimumdefaultWORD SIZEW = 11, t
30、= 16, coding: 1101101101101101W = 11, t = 16, non-coding: 1110010110110111W = 12, t = 16, coding: 1111101101101101W = 12, t = 16, non-coding: 1110110110110111W = 11, t = 18, coding: 101101100101101101W = 11, t = 18, non-coding: 111010010110010111W = 12, t = 18, coding: 101101101101101101W = 12, t =
31、18, non-coding: 111010110010110111W = 11, t = 21, coding: 100101100101100101101W = 11, t = 21, non-coding: 111010010100010010111W = 12, t = 21, coding: 100101101101100101101W = 12, t = 21, non-coding: 111010010110010010111 Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive h
32、omology search. Bioinformatics March, 2002; 18(3):440-5 W = word size; # matches in templatet = template lengthNM_017460Homo sapiens cytochrome P450, family 3, subfamily A, polypeptide 4 (CYP3A4), transcript variant 1, mRNA (2768 letters) vs Drosophila MegaBLAST = “No significant similarity found.”
33、Discontiguous megaBLAST = Discontiguous megaBLAST = numerous hits . . .Query: NM_078651 Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp)/note= mushroom bodies tiny; synonyms: Pak2, STE20, dPAK2 MegaBLAST = “No significant similarity found.”Database: nr (nt), MammaliaorgnPosition-specific Ite
34、rated BLASTgi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINEMAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFVIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDEQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYRTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKEN
35、MHFEVCPWSSYLTGAVRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK0.005E value cutoff for PSSMSame results as protein-protein BLAST; different formatOther purine nucleotide metabolizing enzymes not found by ordinary BLASTJust below threshold, another nucleotide metabolism enzymeCheck to ad
36、d to PSSMAMP Deaminases. . . .gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASELIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHVIKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDILKSEDDLLNFPSVEHVTS
37、VVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEIASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEKGAxxxxGKSTWhats Whats New?New?Select lower caseSelect red gray line = same database hit hsps color-coded independentlylow complexity sequence filteredlow complexity sequence filteredLi
38、mit to Organismprotein allfilter Nprotein allfilter NExample Entrez Queriesproteins allFilter NOT mammaliaOrganismray finned fishesOrganismsrcdb refseqProperties Nucleotide only:biomol mrnaPropertiesbiomol genomicPropertiesOtherAdvancede 10000expect value-v 2000descriptions-b 2000alignments-e 10000
39、-v 2000-e 10000 -v 2000Gene“hemochromatosis”nucleotide sequenceGenomeBLASTMap ViewerSNPProteinDomainstext searchsequence searchTGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGACCACTCTACGGTGTCGGGCCTTGAACTACTACCCCCAGAACATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGACCTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGCHuman ESTCCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG-W 7 e 1000forward primerreverse primerforward primerreverse primerforwardreverse/refse
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年医疗服务合同
- 重庆市2026届高三化学下学期学情测试卷4试题含解析
- 2026重症患者应激性溃疡防治指南
- 2025年重庆两山建设投资集团有限公司招聘真题
- 2025年吉安市泰和县旅游投资发展有限公司招聘考试真题
- 《数控加工编程与操作2》课件-3.2.3曲面实体化
- 2026年阿克苏市工会系统事业单位人员招聘考试备考试题及答案详解
- 2026广西百色田东县博物馆讲解员招聘1人考试模拟试题及答案解析
- 2026年德阳市社区工作者招聘考试备考试题及答案详解
- 柬埔寨汉语教学前景
- 2026年抗菌药物考试题及答案
- 2026年山东省夏季高考《语文》作文专项练习及答案解析(全国I卷)
- 第二轮土地承包到期后再延长30年试点工作意见政策解读
- 四川省成都市 2026 届高三第三次诊断性考试试题(含答案)
- 2018年上半年全国事业单位联考D类《职业能力倾向测验》答案+解析
- 2026年北京市平谷区初三下学期一模道德与法治试卷和答案
- 医院屋顶光伏施工造价预算方案模板
- 广播安装施工方案(3篇)
- 健身气功八段锦教案
- 最新-精神活性物质所致精神障碍-课件
- 被动语态游戏教育课件
评论
0/150
提交评论