版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
AFieldGuide
GenomeresourcesSequencesimilarity.GenomeResourcesLocusLinkGenedatabaseUniGeneTraceArchiveMapViewerHomologene.GenomicBiology....GenomeProjects:microb...GenomeResourcesLocusLinkGenedatabaseUniGeneTraceArchiveMapViewerHomologene.Asinglequeryinterfaceto…
Sequences -RefSeqs -GenBank -HomologeneMaps–MapViewerEntrezlinksLocusLinkLocusLinkwillbereplacedbyEntrezGeneonMARCH1,2005.CheckGeneFAQforcurrentinformation..EntrezGeneLocusLinkAsinglequeryinterfaceto…
Sequences-RefSeqs-GenBank-HomologeneMaps–MapViewer
EntrezlinksEntrezGene
Moreorganisms-allRefSeqgenomesEntrezintegration.Gsn[sym]淀粉样变性病..GlobalEntrez:NADH2nadh247.EntrezGene:NADH226records.GeneRecordforPongoNADH2Homosapiens.DisplayExons/Introns:GeneTable.GeneTable.ARecordWithMoreData:HumanHFE血色沉着病.GeneGraphicLinksNM_NP_.Introns/Exons:GeneTablelinkstosequence.ARecordWithMoreData:HumanHFE.EntrezSNPhfe[genename]ANDhuman[orgn]52血色沉着病.LinkingtoSNP染色体定位基因定位序列定位.SNPinStructure.LinktoOMIM.VariantsinOMIM.GenomeResourcesLocusLinkGenedatabaseTraceArchiveMapViewerHomologeneUniGene.Gene-orientedclustersofexpressedsequences
AutomaticclusteringusingMegaBlastEachclusterrepresentsauniquegeneInformedbygenomehitsInformationontissuetypesandmaplocationsUsefulforgenediscoveryandselectionofmapping reagentsUniGene.AClusterofESTsquery5’ESThits3’ESThits.Unigene.UniGeneCollections.ExampleUniGeneCluster.HistogramofclustersizesforUniGeneHsbuild177.UniGeneClusterHs.95351.UniGeneClusterHs.95351.UniGeneClusterHs.95351:expression.UniGeneClusterHs.95351:seqs.Downloadsequenceswebpageftpsite.GenomeResourcesLocusLinkGenedatabaseUniGeneTraceArchiveMapViewerHomologene.TheNewHomologeneAutomateddetectionofhomologsamongtheannotatedgenesofcompletelysequencedeukaryoticgenomes.NolongerUniGenebasedProteinsimilaritiesfirstGuidedbytaxonomictreeIncludesorthologsandparalogs.
Orthologs和Paralogs是同源序列的两种类型。
Orthologs(垂直同源基因)是指来自于不同物种的由垂直家系(物种形成)进化而来的蛋白,并且典型的保留与原始蛋白有相同的功能。Paralogs(平行同源基因)是那些在一定物种中的来源于基因复制的蛋白,可能会进化出新的与原来有关的功能。请参考文献获得更多的信息。.geneduplicationParalogsvsOrthologsearlyglobingeneA-chaingene B-chaingenefrogAchickAmouseAmouse
BchickBfrogBparalogsorthologs
orthologs.TheNewHomologene
HomologeneBuild37.2Species Numberofgenes inputgrouped groups.RAG1→Homologenerag112recombinationactivatinggene.RAG1→HomolgeneRAG1Amniota.Homolgene:RAG1..Homolgene:RAG1.GenomeResourcesLocusLinkGenedatabaseUniGeneTraceArchiveMapViewerHomologene....MapViewer.ListView.HumanMapVieweradar腺甙脱氨酶.MapViewer:HumanADAR4.MVHsADAR3’UTR5’UTR.Maps&Options--Sequencemaps--AbinitioAssemblyRepeatsBES_CloneCloneNCI_CloneContigComponentCpGislanddbSNPhaplotypeFosmidGenBank_DNAGenePhenotypeSAGE_TagSTSTCAG_RNATranscript(RNA)Hs_UniGeneHs_EST--Cytogeneticmaps--IdeogramFISHCloneGene_CytogeneticMitelmanBreakpointMorbid/Disease--GeneticMaps--deCODEGenethonMarshfield--RHmaps--GeneMap99-G3GeneMap99-GB4NCBIRHStandford-G3TNGWhitehead-RHWhitehead-YACMm_UniGeneMm_ESTRn_UniGeneRn_ESTSsc_UniGeneSsc_ESTBt_UniGeneBt_ESTGga_UniGeneGga_ESTVariationMaps&Options=SNP.MapViewerUniGeneComponentRepeatsGene.Mastermap:repeats.GenePhenotypeVariation.Maps&OptionsMaps&Options.GenomeResourcesLocusLinkGenedatabaseUniGeneTraceArchiveMapViewerHomologene...Strongylocentrotuspurpuratus
Traces.BLASTBasicLocalAlignmentSearchTool.WebAccessBLASTVASTEntrezTextSequenceStructure..BasicLocalAlignmentSearchTool
Whyusesequencesimilarity?BLASTalgorithmBLASTstatisticsBLASToutputExamples.WhyDoWeNeed
SequenceSimilaritySearching?ToidentifyandannotatesequencesToevaluateevolutionaryrelationshipsOther:modelgenomicstructure(e.g.,Spidey)checkprimerspecificityinsilico:NCBI’stool.BLASTWebsiteStats.GlobalvsLocalAlignmentSeq1Seq2Seq1Seq2GlobalalignmentLocalalignment.GlobalvsLocalAlignmentSeq1:WHEREISWALTERNOW(16aa)Seq2:HEWASHEREBUTNOWISHERE(21aa)GlobalSeq1: 1W--HEREISWALTERNOW16 WHERESeq2: 1HEWASHEREBUTNOWISHERE21LocalSeq1:1W--HERE5Seq1:1W--HERE5WHEREWHERESeq2:3WASHERE9Seq2:15WISHERE21.TheFlavorsofBLASTStandardBLASTtraditional“contiguous”wordhitpositionindependentscoringnucleotide,proteinandtranslations(blastn,blastp,blastx,tblastn,tblastx)MegablastoptimizedforlargebatchsearchescanusediscontiguouswordsPSI-BLASTconstructsPSSMsautomatically;usesasqueryverysensitiveproteinsearchRPSBLASTsearchesadatabaseofPSSMstoolforconserveddomainsearches.WidelyusedsimilaritysearchtoolHeuristicapproachbasedonSmithWatermanalgorithmFindsbestlocalalignmentsProvidesstatisticalsignificanceAllcombinations(DNA/Protein)queryanddatabase.DNAvsDNA
blastnDNAtranslationvsProtein
blastxProteinvsProtein
blastpProteinvsDNAtranslation
tblastnDNAtranslationvsDNAtranslation
tblastx
www,standalone,andnetworkclientsBasicLocalAlignmentSearchTool.TranslatedBLASTQueryDatabaseProgramNPucleotideroteinNNNNPPblastxtblastntblastxPPPPPPPPPPPPPPPPPPPPPPPPParticularlyusefulfornucleotidesequenceswithoutproteinannotations,suchasESTsorgenomicDNA.HowBLASTWorksMakelookuptableof“words”forqueryScandatabaseforhitsUngappedextensionsofhits(initialHSPs)Gappedextensions(notraceback)Gappedextensions(traceback;alignmentdetails).NucleotideWordsGTACTGGACATGGACCCTACAGGAAQuery:GTACTGGACATTACTGGACATGACTGGACATGGCTGGACATGGATGGACATGGACGGACATGGACCGACATGGACCCACATGGACCCTMakealookuptableofwords11-mer...828megablast711blastnminimumdefaultWORDSIZE.ProteinWordsGTQITVEDLFYNIATRRKALKNQuery:
NeighborhoodWordsLTV,MTV,ISV,LSV,etc.GTQTQIQITITVTVEVEDEDLDLF...MakealookuptableofwordsWordsize=3(default)Wordsizecanonlybe2or3[-f11=blastpdefault].MinimumRequirementsforaHit
NucleotideBLASTrequiresoneexactmatchProteinBLASTrequirestwoneighboringmatcheswithin40aaGTQITVEDLFYNI
SEIYYNATCGCCATGCTTAATTGGGCTT
CATGCTTAATT
neighborhoodwordsoneexactmatchtwomatches[-A40=blastpdefault].BLASTPSummary
YLS
HFLSbjct287LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI333Query1IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI47GappedextensionwithtracebackQuery1IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV…50
+EYAYLKF+YLSL+SP++DVNVHP+KVHFL+++I++Sbjct287LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI…337FinalHSP
+EYAYLKF+
L+SP++DVNVHP+KV
+++I
High-scoringpair(HSP)HFL18HFV15HFS14HWL13NFL13DFL12HWV10etc…YLS15YLT12
YVS12YIT10etc…NeighborhoodwordsNeighborhoodscorethresholdT(-f)=11Query:
IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…examplequerywords.ScoringSystems-Nucleotides
AGCTA+1–3–3-3G–3+1–3-3C–3–3+1-3T–3–3–3+1IdentitymatrixCAGGTAGCAAGCTTGCATGTCA||||||||||||||||||| rawscore=19-9=10CACGTAGCAAGCTTG-GTGTCA[-r1-q-3].ScoringSystems-ProteinsPositionIndependentMatricesPAMMatrices(PercentAcceptedMutation)Derivedfromobservation;smalldatasetof alignmentsImplicitmodelofevolutionAllcalculatedfromPAM1PAM250widelyusedBLOSUMMatrices(BLOck
SUbstitution
Matrices)Derivedfromobservation;largedatasetofhighly conservedblocksEachmatrixderivedseparatelyfromblockswitha definedpercentidentitycutoffBLOSUM62-defaultmatrixforBLASTPositionSpecificScoreMatrices(PSSMs)
PSI-andRPS-BLAST.A4R-15
N-206D-2-216C0-3-3-39Q-1100-35E-1002-425G0-20-1-3-2-26H-201-1-300-28I-1-3-3-3-1-3-3-4-34
L-1-2-3-4-1-2-3-4-324K-120-1-311-2-1-3-25M-1-1-2-3-10-2-3-212-15F-2-3-3-3-2-3-3-3-100-306P-1-2-2-1-3-1-1-2-2-3-3-1-2-47S1-110-1000-1-2-20-1-2-14T0-10-1-1-1-1-2-2-1-1-1-1-2-115W-3-3-4-4-2-2-3-2-2-3-2-3-11-4-3-211Y-2-2-2-3-2-1-2-32-1-1-2-13-3-2-227V0-3-3-3-1-2-2-3-331-21-1-2-20-3-14X0-1-1-1-2-1-1-1-1-1-1-1-1-1-200-2-1-1-1ARNDCQEGHILKMFPSTWYVXBLOSUM62DFNegativeforlesslikelysubstitutionsDYFPositiveformorelikelysubstitutions.Position-SpecificScoreMatrixDAF-1Serine/Threonineproteinkinasescatalyticloop174PSSMscores54.
ARNDCQEGHILKMFPSTWYV435K-100-1-23030-2-21-1-1-1-1-1-1-1-2436E0102-102-10-1-1000-100-1-1-1437S00-10110110-100020-1-10-1438N-10-1-110-133-1-11-100-1-111-1439K-211-1-20-1-2-2-1-251-2-2-1-1-2-2-1440P-2-2-2-2-3-2-2-2-2-1-2-10-37-1-2-3-1-1441A3-21-20-101-2-2-20-1-2310-3-30442M-3-4-4-4-3-4-4-5-470-410-4-4-2-4-12443A4-4-4-40-4-4-3-44-1-4-2-3-4-1-2-4-34
444H-4-2-1-3-5-2-2-410-6-5-3-4-3-2-3-4-50-5
445R-48-3-40-1-2-3-2-5-40-3-2-4-3-30-4-5
446D-4-4-18-6-20-3-3-5-6-3-5-6-4-2-3-7-5-5
447I-4-5-6-6-3-4-5-6-535-511-5-5-3-4-31
448K001-3-5-1-1-3-3-5-57-4-5-3-1-2-5-4-4
449S0-3-2-30-2-2-3-3-4-4-2-4-5262-5-4-4
450K0301-500-4-1-4-34-3-221-1-5-4-4
451N-4-38-1-5-2-2-3-1-6-6-2-4-5-4-1-2-6-4-5
452I-3-5-5-60-5-5-6-562-52-2-5-4-3-5-33
453M-4-4-6-6-3-4-5-6-506-510-5-4-3-4-30454V-3-3-5-6-3-4-5-6-533-42-2-5-4-3-5-35455K-2114-50-1-21-4-24-3-2-30-1-5-2-3456N1130-4-110-3-4-43-2-5-22-2-5-4-4457D-3-255-1-11-10-5-40-2-5-10-2-6-4-5458L-3-10-30-3-23-4-23011-2-2-35-1-3Position-SpecificScoreMatrixcatalyticloop[>./blastpgp-iNP_499868.2-dnr-j3-QNP_499868.pssm].LocalAlignmentStatisticsHighscoresoflocalalignmentsbetweentworandomsequencesfollowtheExtremeValueDistributionScore(S)Alignments(appliestoungappedalignments)E=Kmne-SorE=mn2-S’K=scaleforsearchspace=scaleforscoringsystemS’=bitscore=(S-lnK)/ln2ExpectValueE=numberofdatabasehitsyouexpecttofindbychance,≥SyourscoreexpectednumberofrandomhitsMoreinfo:
/BLAST/tutorial/Altschul-1.html
.AdvancedBLASTOptions:NucleotideExampleEntrezQueriesnucleotideall[Filter]NOTmammalia[Organism]greenplants[Organism]biomolmrna[Properties]gbdivest[Properties]ANDrat[organism]OtherAdvanced–e10000 expectvalue-v2000 descriptions-b2000 alignments.AdvancedBLASTOptions:ProteinMatrixSelectionPAM30--moststringentBLOSUM45--leaststringentExampleEntrezQueriesproteinsall[Filter]NOTmammalia[Organism]greenplants[Organism]srcdbrefseq[Properties]OtherAdvanced–e10000 expectvalue-v2000 descriptions-b2000 alignmentsLimitbytaxonMusmusculus[Organism]Mammalia[Organism]Viridiplantae[Organism].sp|P27476|NSR1_YEASTNUCLEARLOCALIZATIONSEQUENCEBINDINGPROTEIN(P67)Length=414Score=40.2bits(92),Expect=0.013Identities=35/131(26%),Positives=56/131(42%),Gaps=4/131(3%)Query:362STTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTD418S++SSSS+SSS++++S++SSS++EKSbjct:29SSSSSESSSSSSSSSESESESESESESSSSSSSSDSESSSSSSSDSESEAETKKEESKDS88FilteredUnfilteredLowComplexityFiltering.OtherBLASTAlgorithms
MegablastDiscontiguousMegablastPSI-BLASTPHI-BLAST.Megablast:NCBI’sGenomeAnnotator
LongalignmentsofsimilarDNAsequencesGreedyalgorithmConcatenationofquerysequencesFasterthanblastn;lesssensitive.MegaBLAST&WordSizeTrade-off:sensitivityvsspeed23blastp828megablast711blastnminimumdefaultWORDSIZE.DiscontiguousMegablastUsesdiscontiguouswordmatchesBetterforcross-speciescomparisons.TemplatesforDiscontiguousWordsW=11,t=16,coding: 1101101101101101W=11,t=16,non-coding: 1110010110110111W=12,t=16,coding: 1111101101101101W=12,t=16,non-coding: 1110110110110111W=11,t=18,coding: 101101100101101101W=11,t=18,non-coding: 111010010110010111W=12,t=18,coding: 101101101101101101W=12,t=18,non-coding: 111010110010110111W=11,t=21,coding: 100101100101100101101W=11,t=21,non-coding: 111010010100010010111W=12,t=21,coding: 100101101101100101101W=12,t=21,non-coding: 111010010110010010111Reference:Ma,B,Tromp,J,Li,M.PatternHunter:fasterandmoresensitivehomologysearch.BioinformaticsMarch,2002;18(3):440-5W=wordsize;#matchesintemplatet=templatelength.Discontiguous(Cross-species)MegaBLAST.DiscontiguousWordOptions.MegaBLASTvsDiscontiguousMegaBLASTNM_017460HomosapienscytochromeP450,family3,subfamilyA,polypeptide4(CYP3A4),transcriptvariant1,mRNA(2768letters)vsDrosophila.MegaBLASTvsDiscontiguousMegaBLAST
MegaBLAST=“Nosignificantsimilarityfound.”
Discontiguous
megaBLAST=.AnotherExample...
Discontiguous
megaBLAST=numeroushits...Query:NM_078651DrosophilamelanogasterCG18582-PA(mbt)mRNA,(3244bp)/note=mushroombodiestiny;synonyms:Pak2,STE20,dPAK2
MegaBLAST=“Nosignificantsimilarityfound.”Database:nr(nt),Mammalia[orgn].Ex:DiscontiguousMegaBLAST.Ex:BLASTN.PSI-BLASTExample:ConfirmingrelationshipsofpurinenucleotidemetabolismproteinsPosition-specificIteratedBLAST.>gi|113340|sp|P03958|ADA_MOUSEADENOSINEDEAMINASE(ADENOSINEMAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFVIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDEQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYRTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGAVRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKKPSI-BLAST0.005EvaluecutoffforPSSM.RESULTS:InitialBLASTPSameresultsasprotein-proteinBLAST;differentformat.ResultsofFirstPSSMSearchOtherpurinenucleotidemetabolizingenzymesnotfoundbyordinaryBLAST.TenthPSSMSearch:ConvergenceJustbelowthreshold,anothernucleotidemetabolismenzymeChecktoaddtoPSSM.ReversePSI-BLAST(RPS)-BLAST.Adenosine/AMPDeaminaseDomainAMPDeaminases....PHI-BLAST>gi|231729|sp|P30429|CED4_CAEELCELLDEATHPROTEIN4MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASELIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHVIKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDILKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEIASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSS
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025江西省电子信息工程学校工作人员招聘考试试题
- 2025江城哈尼族彝族自治县职业高级中学工作人员招聘考试试题
- 城市地下人行通道施工方案
- 2026年智能安防AI人脸识别创新报告
- 2026年制造业领域智能制造技术创新报告
- 智能研修模式中问题导向教学在心理健康教育中的应用与实践教学研究课题报告
- 幼儿园教师观察记录工具使用熟练度影响研究-基于观察记录质量与培训记录关联数据分析
- 2026年数码配件生产管理创新报告
- 2026年生物识别安全技术行业创新报告
- 2025年城市智慧停车管理系统与城市交通一体化可行性研究
- DB41-T 2500-2023 地下水监测井洗井、修井技术规范
- 上海铁路局招聘笔试考什么内容
- 北师大版七年级数学下册-第一章-名校检测题【含答案】
- 浙二医院胸外科护士进修汇报
- DGTJ08-2323-2020 退出民防序列工程处置技术标准
- 党支部书记讲廉洁党课讲稿
- 广东省佛山市华英学校2024-2025学年上学期七年级入学分班考试英语试卷
- 猴痘培训课件
- 施工试验送检方案(3篇)
- YY 0267-2025血液净化体外循环系统血液透析器、血液透析滤过器、血液滤过器及血液浓缩器用体外循环血路/液路
- 2025年四川省泸州市中考英语真题 (原卷版)
评论
0/150
提交评论