The goal of protein structure prediction by threading is to find a  通过线程预测蛋白质结构的目标是找到_第1页
The goal of protein structure prediction by threading is to find a  通过线程预测蛋白质结构的目标是找到_第2页
The goal of protein structure prediction by threading is to find a  通过线程预测蛋白质结构的目标是找到_第3页
The goal of protein structure prediction by threading is to find a  通过线程预测蛋白质结构的目标是找到_第4页
The goal of protein structure prediction by threading is to find a  通过线程预测蛋白质结构的目标是找到_第5页
已阅读5页,还剩9页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

THREADINGALGORITHMS

JadwigaBienkowska1,2andRickLathrop3,4

1SeronoReproductiveBiologyInstitute

OneTechnologyPl,Rockland,MA02370

2BiomedicalEngineeringDepartment

BostonUniversity

36CummingtonSt,Boston02215

3SchoolofInformationandComputerSciences

4DepartmentofBiomedicalEngineering

UniversityofCalifornia,Irvine

Irvine,CA92697-3425

Abstract

Thischapterreviewsvariousalgorithmsthathavebeendevelopedforproteinstructurepredictionbythreading.Threadingalgorithmsdependonmanyfeaturesofproteinsequenceandstructurerepresentationsandnotallofthemareusedbyallmethods.Thereisnostandardformalismtorepresentthosedifferentfeatures.Whileprovidingnecessarydetail,sometimestheformalnotationobscuresthecoreideaofanalgorithm.Inordertomakethecoreideastransparenttomanyreaderswemadeanefforttoadheretoasimpleformalismandtriedtoavoidmathematicalformulas.Thisforcedustoomitmanydetailsandcompleteoutlinesofalgorithms,forwhichthereaderisreferredtotheoriginalliteratureforadetailedpicture.

Keywords:StructurePrediction,InverseFolding,Threading,SequenceSimilarity,StructureSimilarity,Algorithms

Background

Thegoalofproteinstructurepredictionbythreadingistoalignaproteinsequencecorrectlytoastructuralmodel.Thisrequireschoosingboththecorrectstructuralmodelfromalibraryofmodelsandthecorrectalignmentfromthespaceofpossiblesequence-structurealignments.Oncechosen,thealignmentestablishesacorrespondencebetweenaminoacidsinthesequenceandspatialpositionsinthemodel.Assigningeachalignedaminoacidtoitscorrespondingspatialpositionplacesthesequenceintothethree-dimensional(3D)proteinfoldrepresentedbythemodel.Typically,themodelrepresentsonlythespatiallyconservedpositionsofthefold,oftentheproteincore,soproducingafull-atomproteinmodelwouldrequirefurtherstepsofloopplacementandside-chainpacking.Proteinthreadinghasaroleinproteinstructurepredictionthatisintermediatebetweenhomologymodelingandabinitioprediction.Likehomologymodeling,itusesknownproteinstructuresastemplatesforsequencesofunknownstructure.Likeabinitioprediction,itseekstooptimizeapotentialfunction(anobjectiveorscorefunction)measuringgoodnessoffitofthesequenceinaparticularspatialconfiguration.Threadingistheproteinstructurepredictionmethodofchoicewhen(1)thesequencehaslittleornoprimarysequencesimilaritytoanysequencewithaknownstructure,and(2)somemodelfromthestructurelibraryrepresentsthetruefoldofthesequence.

Proteinthreadingrequires(1)arepresentationofthesequence,(2)alibraryofstructuralmodels,(3)anobjectivefunctionthatscoressequence-structurealignments,(4)amethodofaligningthesequencetothemodel,and(5)amethodofselectingamodelfromthelibrary.Followingtheinitialconceptionofthethreadingapproachtoproteinstructureprediction

ADDINEN.CITE<EndNote><Cite><Author>Bowie</Author><Year>1991</Year><RecNum>10</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>1853201</ACCESSION_NUMBER><VOLUME>253</VOLUME><NUMBER>5016</NUMBER><YEAR>1991</YEAR><DATE>Jul12</DATE><TITLE>Amethodtoidentifyproteinsequencesthatfoldintoaknownthree-dimensionalstructure</TITLE><PAGES>164-70</PAGES><AUTHOR_ADDRESS>MolecularBiologyInstitute,UniversityofCalifornia,LosAngeles90024-1570.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Bowie,J.U.</AUTHOR><AUTHOR>Luthy,R.</AUTHOR><AUTHOR>Eisenberg,D.</AUTHOR></AUTHORS><SECONDARY_TITLE>Science</SECONDARY_TITLE><KEYWORDS><KEYWORD>Actins/chemistry/ultrastructure</KEYWORD><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>Animals</KEYWORD><KEYWORD>CarrierProteins/chemistry</KEYWORD><KEYWORD>ComparativeStudy</KEYWORD><KEYWORD>*EscherichiacoliProteins</KEYWORD><KEYWORD>MolecularStructure</KEYWORD><KEYWORD>Myoglobin/chemistry/ultrastructure</KEYWORD><KEYWORD>*PeriplasmicBindingProteins</KEYWORD><KEYWORD>*ProteinConformation</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>Receptors,CyclicAMP/chemistry/ultrastructure</KEYWORD><KEYWORD>Structure-ActivityRelationship</KEYWORD><KEYWORD>Support,Non-U.S.Gov't</KEYWORD><KEYWORD>Support,U.S.Gov't,Non-P.H.S.</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1853201</URL></MDL></Cite><Cite><Author>Jones</Author><Year>1992</Year><RecNum>8</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>1614539</ACCESSION_NUMBER><VOLUME>358</VOLUME><NUMBER>6381</NUMBER><YEAR>1992</YEAR><DATE>Jul2</DATE><TITLE>Anewapproachtoproteinfoldrecognition</TITLE><PAGES>86-9</PAGES><AUTHOR_ADDRESS>DepartmentofBiochemistryandMolecularBiology,UniversityCollege,London,UK.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Jones,D.T.</AUTHOR><AUTHOR>Taylor,W.R.</AUTHOR><AUTHOR>Thornton,J.M.</AUTHOR></AUTHORS><SECONDARY_TITLE>Nature</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>Models,Theoretical</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>Myoglobin/chemistry</KEYWORD><KEYWORD>Phycocyanin/*chemistry</KEYWORD><KEYWORD>*ProteinConformation</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>SequenceAlignment</KEYWORD><KEYWORD>Structure-ActivityRelationship</KEYWORD><KEYWORD>Support,Non-U.S.Gov't</KEYWORD><KEYWORD>Thermodynamics</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1614539</URL></MDL></Cite></EndNote>

(Bowie,Luthyetal.1991;Jones,Tayloretal.1992)

therehavebeenverymanydifferentapproachestotheseproblems,ofwhichthischaptercanpresentonlyafewgeneralthemes.

RepresentationoftheQuerySequence

Itiswidelyacceptedthatsignificantlysimilarproteinsequencesalsoadoptasimilar3Dstructure.TheParacelsusChallengedemonstratedthedesignofaproteinsequencewith50%sequenceidentitytoaknownproteinbutadifferent3Dstructure

ADDINEN.CITE<EndNote><Cite><Author>Jones</Author><Year>1996</Year><RecNum>9</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>8859998</ACCESSION_NUMBER><VOLUME>24</VOLUME><NUMBER>4</NUMBER><YEAR>1996</YEAR><DATE>Apr</DATE><TITLE>TowardsmeetingtheParacelsusChallenge:Thedesign,synthesis,andcharacterizationofparacelsin-43,analpha-helicalproteinwithover50%sequenceidentitytoanall-betaprotein</TITLE><PAGES>502-13</PAGES><AUTHOR_ADDRESS>DepartmentofBiochemistryandMolecularBiology,UniversityCollege,London,UK.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Jones,D.T.</AUTHOR><AUTHOR>Moody,C.M.</AUTHOR><AUTHOR>Uppenbrink,J.</AUTHOR><AUTHOR>Viles,J.H.</AUTHOR><AUTHOR>Doyle,P.M.</AUTHOR><AUTHOR>Harris,C.J.</AUTHOR><AUTHOR>Pearl,L.H.</AUTHOR><AUTHOR>Sadler,P.J.</AUTHOR><AUTHOR>Thornton,J.M.</AUTHOR></AUTHORS><SECONDARY_TITLE>Proteins</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>*AntimicrobialCationicPeptides</KEYWORD><KEYWORD>CircularDichroism</KEYWORD><KEYWORD>Hydrogen-IonConcentration</KEYWORD><KEYWORD>MagneticResonanceSpectroscopy</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>Peptides/chemicalsynthesis/chemistry</KEYWORD><KEYWORD>ProteinConformation</KEYWORD><KEYWORD>SequenceHomology,AminoAcid</KEYWORD><KEYWORD>Support,Non-U.S.Gov't</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=8859998</URL></MDL></Cite></EndNote>

(Jones,Moodyetal.1996)

,butwhennaturalevolutionproducessimilarproteinsequencestheirproteinstructuresgenerallyaresimilaraswell.Thus,innaturallyoccurringproteins,sequencesthataresimilartothequerysequencecarryusefulinformationaboutits3Dstructure.Amultiplesequencealignmentcenteredonthequerysequencereflectssequencevariabilitywithintheproteinfamilytowhichthequerysequencebelongs.Mostmodernthreadingalgorithmsexploitthisfact

ADDINEN.CITE<EndNote><Cite><Author>Fischer</Author><Year>2000</Year><RecNum>26</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>10902162</ACCESSION_NUMBER><YEAR>2000</YEAR><TITLE>Hybridfoldrecognition:combiningsequencederivedpropertieswithevolutionaryinformation</TITLE><PAGES>119-30</PAGES><AUTHOR_ADDRESS>Dept.ofMathandComputerScience,FacultyofNaturalScience,BenGurionUniversity,Beer-Sheva,Israel.dfischer@cs.bgu.ac.il</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Fischer,D.</AUTHOR></AUTHORS><SECONDARY_TITLE>PacSympBiocomput</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>EvaluationStudies</KEYWORD><KEYWORD>*Evolution,Molecular</KEYWORD><KEYWORD>Human</KEYWORD><KEYWORD>*ProteinFolding</KEYWORD><KEYWORD>Proteins/*chemistry/*genetics</KEYWORD><KEYWORD>SensitivityandSpecificity</KEYWORD><KEYWORD>SequenceAlignment/methods/statistics&numericaldata</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10902162</URL></MDL></Cite><Cite><Author>Jones</Author><Year>1999</Year><RecNum>1</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>10191147</ACCESSION_NUMBER><VOLUME>287</VOLUME><NUMBER>4</NUMBER><YEAR>1999</YEAR><DATE>Apr9</DATE><TITLE>GenTHREADER:anefficientandreliableproteinfoldrecognitionmethodforgenomicsequences</TITLE><PAGES>797-815</PAGES><AUTHOR_ADDRESS>DepartmentofBiologicalSciences,UniversityofWarwick,Coventry,CV47AL,UK.jones@globin.bio.warwick.ac.uk</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Jones,D.T.</AUTHOR></AUTHORS><SECONDARY_TITLE>JMolBiol</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>*Genome</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>NeuralNetworks(Computer)</KEYWORD><KEYWORD>OpenReadingFrames</KEYWORD><KEYWORD>*ProteinConformation</KEYWORD><KEYWORD>*ProteinFolding</KEYWORD><KEYWORD>ReproducibilityofResults</KEYWORD><KEYWORD>SequenceAlignment/*methods</KEYWORD><KEYWORD>SequenceHomology,AminoAcid</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10191147</URL></MDL></Cite><Cite><Author>Karplus</Author><Year>2001</Year><RecNum>3</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>11524372</ACCESSION_NUMBER><VOLUME>17</VOLUME><NUMBER>8</NUMBER><YEAR>2001</YEAR><DATE>Aug</DATE><TITLE>EvaluationofproteinmultiplealignmentsbySAM-T99usingtheBAliBASEmultiplealignmenttestset</TITLE><PAGES>713-20</PAGES><AUTHOR_ADDRESS>ComputerEngineering,UniversityofCalifornia,SantaCruz,59064,USA.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Karplus,K.</AUTHOR><AUTHOR>Hu,B.</AUTHOR></AUTHORS><SECONDARY_TITLE>Bioinformatics</SECONDARY_TITLE><KEYWORDS><KEYWORD>ComparativeStudy</KEYWORD><KEYWORD>ComputationalBiology</KEYWORD><KEYWORD>*Databases,Protein</KEYWORD><KEYWORD>MarkovChains</KEYWORD><KEYWORD>Proteins/*chemistry/*genetics</KEYWORD><KEYWORD>SequenceAlignment/standards/*statistics&numericaldata</KEYWORD><KEYWORD>*Software</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11524372</URL></MDL></Cite><Cite><Author>Kelley</Author><Year>2000</Year><RecNum>25</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>10860755</ACCESSION_NUMBER><VOLUME>299</VOLUME><NUMBER>2</NUMBER><YEAR>2000</YEAR><DATE>Jun2</DATE><TITLE>Enhancedgenomeannotationusingstructuralprofilesintheprogram3D-PSSM</TITLE><PAGES>499-520</PAGES><AUTHOR_ADDRESS>BiomolecularModellingLaboratory,ImperialCancerResearchFund,44Lincoln'sInnFields,London,WC2A3PX,England.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Kelley,L.A.</AUTHOR><AUTHOR>MacCallum,R.M.</AUTHOR><AUTHOR>Sternberg,M.J.</AUTHOR></AUTHORS><SECONDARY_TITLE>JMolBiol</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>Animals</KEYWORD><KEYWORD>BacterialProteins/chemistry/genetics</KEYWORD><KEYWORD>ComputationalBiology/methods</KEYWORD><KEYWORD>Databases,Factual</KEYWORD><KEYWORD>Flavoproteins/chemistry/genetics</KEYWORD><KEYWORD>*Genome,Bacterial</KEYWORD><KEYWORD>Integrases/chemistry/classification/genetics</KEYWORD><KEYWORD>Models,Molecular</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>Mycoplasma/chemistry/*genetics</KEYWORD><KEYWORD>OpenReadingFrames/genetics</KEYWORD><KEYWORD>ProteinStructure,Secondary</KEYWORD><KEYWORD>Proteins/*chemistry/classification/*genetics</KEYWORD><KEYWORD>*Proteome</KEYWORD><KEYWORD>ReproducibilityofResults</KEYWORD><KEYWORD>RetroviridaeProteins/chemistry/genetics</KEYWORD><KEYWORD>RibonucleaseH,CalfThymus/chemistry/genetics</KEYWORD><KEYWORD>SequenceAlignment</KEYWORD><KEYWORD>SequenceHomology,AminoAcid</KEYWORD><KEYWORD>*Software</KEYWORD><KEYWORD>Solvents</KEYWORD><KEYWORD>Structure-ActivityRelationship</KEYWORD><KEYWORD>Support,Non-U.S.Gov't</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10860755</URL></MDL></Cite><Cite><Author>Panchenko</Author><Year>2000</Year><RecNum>19</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>10698636</ACCESSION_NUMBER><VOLUME>296</VOLUME><NUMBER>5</NUMBER><YEAR>2000</YEAR><DATE>Mar10</DATE><TITLE>Combinationofthreadingpotentialsandsequenceprofilesimprovesfoldrecognition</TITLE><PAGES>1319-31</PAGES><AUTHOR_ADDRESS>NationalCenterforBiotechnologyInformation,NationalInstitutesofHealth,Building38A,Room8N805,Bethesda,MD20894,USA.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Panchenko,A.R.</AUTHOR><AUTHOR>Marchler-Bauer,A.</AUTHOR><AUTHOR>Bryant,S.H.</AUTHOR></AUTHORS><SECONDARY_TITLE>JMolBiol</SECONDARY_TITLE><KEYWORDS><KEYWORD>*Algorithms</KEYWORD><KEYWORD>ComputationalBiology/*methods</KEYWORD><KEYWORD>ConservedSequence</KEYWORD><KEYWORD>Evolution,Molecular</KEYWORD><KEYWORD>*ProteinFolding</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>SensitivityandSpecificity</KEYWORD><KEYWORD>SequenceAlignment/*methods</KEYWORD><KEYWORD>*SequenceHomology,AminoAcid</KEYWORD><KEYWORD>Software</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD><KEYWORD>Templates,Genetic</KEYWORD><KEYWORD>Thermodynamics</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10698636</URL></MDL></Cite><Cite><Author>Rychlewski</Author><Year>2000</Year><RecNum>4</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>10716175</ACCESSION_NUMBER><VOLUME>9</VOLUME><NUMBER>2</NUMBER><YEAR>2000</YEAR><DATE>Feb</DATE><TITLE>Comparisonofsequenceprofiles.Strategiesforstructuralpredictionsusingsequenceinformation</TITLE><PAGES>232-41</PAGES><AUTHOR_ADDRESS>SanDiegoSupercomputerCenter,LaJolla,California92093,USA.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Rychlewski,L.</AUTHOR><AUTHOR>Jaroszewski,L.</AUTHOR><AUTHOR>Li,W.</AUTHOR><AUTHOR>Godzik,A.</AUTHOR></AUTHORS><SECONDARY_TITLE>ProteinSci</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>ComparativeStudy</KEYWORD><KEYWORD>Databases,Factual</KEYWORD><KEYWORD>Models,Molecular</KEYWORD><KEYWORD>ProteinFolding</KEYWORD><KEYWORD>ProteinStructure,Secondary</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>SequenceAlignment/methods/statistics&numericaldata</KEYWORD><KEYWORD>SequenceHomology,AminoAcid</KEYWORD><KEYWORD>Software</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10716175</URL></MDL></Cite><Cite><Author>Skolnick</Author><Year>2003</Year><RecNum>13</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>14579335</ACCESSION_NUMBER><VOLUME>53Suppl6</VOLUME><YEAR>2003</YEAR><TITLE>TOUCHSTONE:aunifiedapproachtoproteinstructureprediction</TITLE><PAGES>469-79</PAGES><AUTHOR_ADDRESS>CenterofExcellenceinBioinformatics,UniversityatBuffalo,Buffalo,NewYork14203,USA.skolnick@</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Skolnick,J.</AUTHOR><AUTHOR>Zhang,Y.</AUTHOR><AUTHOR>Arakaki,A.K.</AUTHOR><AUTHOR>Kolinski,A.</AUTHOR><AUTHOR>Boniecki,M.</AUTHOR><AUTHOR>Szilagyi,A.</AUTHOR><AUTHOR>Kihara,D.</AUTHOR></AUTHORS><SECONDARY_TITLE>Proteins</SECONDARY_TITLE><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=14579335</URL></MDL></Cite></EndNote>

(Jones1999;Fischer2000;Kelley,MacCallumetal.2000;Panchenko,Marchler-Baueretal.2000;Rychlewski,Jaroszewskietal.2000;KarplusandHu2001;Skolnick,Zhangetal.2003)

.

Thequerysequenceisoftenrepresentedbyasequenceprofile,P,wheretheelementisavectorgivingaprobabilitydistributionoverthe20aminoacidsatsequencepositionj.Inthisnotationasinglequerysequencehasaprofilewith1fortheoriginalaminoacidsand0otherwise.Thesequenceprofileistypicallyconstructedfromthesearchofnon-redundantdatabasesofproteins(e.g.,atNCBI)andsequencesarealignedusingmultiple-sequencealignmentprogramslikeCLUSTAL

ADDINEN.CITE<EndNote><Cite><Author>Higgins</Author><Year>1996</Year><RecNum>27</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>8743695</ACCESSION_NUMBER><VOLUME>266</VOLUME><YEAR>1996</YEAR><TITLE>UsingCLUSTALformultiplesequencealignments</TITLE><PAGES>383-402</PAGES><AUTHOR_ADDRESS>EuropeanMolecularBiologyLaboratoryOutstation-EuropeanBioinformaticsInstitute,Hinxton,Cambridge,UnitedKingdom.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Higgins,D.G.</AUTHOR><AUTHOR>Thompson,J.D.</AUTHOR><AUTHOR>Gibson,T.J.</AUTHOR></AUTHORS><SECONDARY_TITLE>MethodsEnzymol</SECONDARY_TITLE><KEYWORDS><KEYWORD>*AminoAcidSequence</KEYWORD><KEYWORD>Animals</KEYWORD><KEYWORD>*BaseSequence</KEYWORD><KEYWORD>DNA/*chemistry</KEYWORD><KEYWORD>*Databases,Factual</KEYWORD><KEYWORD>Evolution,Molecular</KEYWORD><KEYWORD>Fabaceae/genetics</KEYWORD><KEYWORD>Globins/*chemistry/genetics</KEYWORD><KEYWORD>Horses</KEYWORD><KEYWORD>Human</KEYWORD><KEYWORD>Leghemoglobin/chemistry</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>NucleicAcidConformation</KEYWORD><KEYWORD>*Phylogeny</KEYWORD><KEYWORD>Plants,Medicinal</KEYWORD><KEYWORD>ProteinStructure,Secondary</KEYWORD><KEYWORD>Protein-TyrosineKinase/chemistry/genetics</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>*Software</KEYWORD><KEYWORD>srcHomologyDomains</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=8743695</URL></MDL></Cite></EndNote>

(Higgins,Thompsonetal.1996)

orPSI-BLAST

ADDINEN.CITE<EndNote><Cite><Author>Altschul</Author><Year>1997</Year><RecNum>22</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>9254694</ACCESSION_NUMBER><VOLUME>25</VOLUME><NUMBER>17</NUMBER><YEAR>1997</YEAR><DATE>Sep1</DATE><TITLE>GappedBLASTandPSI-BLAST:anewgenerationofproteindatabasesearchprograms</TITLE><PAGES>3389-402</PAGES><AUTHOR_ADDRESS>NationalCenterforBiotechnologyInformation,NationalLibraryofMedicine,NationalInstitutesofHealth,Bethesda,MD20894,USA.altschul@</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Altschul,S.F.</AUTHOR><AUTHOR>Madden,T.L.</AUTHOR><AUTHOR>Schaffer,A.A.</AUTHOR><AUTHOR>Zhang,J.</AUTHOR><AUTHOR>Zhang,Z.</AUTHOR><AUTHOR>Miller,W.</AUTHOR><AUTHOR>Lipman,D.J.</AUTHOR></AUTHORS><SECONDARY_TITLE>NucleicAcidsRes</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>Animals</KEYWORD><KEYWORD>DNA/*chemistry</KEYWORD><KEYWORD>*Databases,Factual</KEYWORD><KEYWORD>Human</KEYWORD><KEYWORD>MolecularSequenceData</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>*SequenceAlignment</KEYWORD><KEYWORD>*Software</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=9254694</URL></MDL></Cite></EndNote>

(Altschul,Maddenetal.1997)

.Somethreadingmethodsalsoincludeanindependentpredictionofthesecondarystructure(SS)orotherderivedinformationaspartofthesequencerepresentation.Insuchcasesthequeryisrepresentedastwoindependentvectors,whereSSmightbehelix,strand,orcoil,amoredetailedsetofsecondarystructureassignments,orotherinformation.

RepresentationofProteinStructureModels

Whatisamodelofproteinstructure?Proteinstructureisfullydeterminedbythe3Dcoordinatesofallnon-hydrogenatoms.Forthreading,the3Dcoordinatesarereducedtomoreabstractrepresentationsofproteinstructure.Typically,structuralcoreelementsaredefinedbythesecondarystructureelements,alphahelicesandbetastrands,usuallywithside-chainsremoved.Amongproteinswithsimilarstructures,largevariationsoccurintheloopregionsconnectingthestructuralelements.Inconsequence,looplengths,loopconformations,andloopresidueinteractionsarerarelyconserved,andoftentheloopresiduesarenotrepresentedexplicitlyinthestructuralmodels.

Themaindistinctionamongthreadingapproachesisthechoiceofthestructuremodelrepresentation.Threadingalgorithmsfallintotwomaincategoriesthatdependontheproteinstructurerepresentationtheyuse:

Inthefirstcategory,aproteinstructureisrepresentedasalinearmodel.

Inthesecond,aproteinstructureisrepresentedasahigher-ordermodel.

Inalinearrepresentation,proteinstructureismodeledasachainofresiduepositionsthatdonotinteract.Inasecond-orderrepresentationthemodelalsoincludesinteractingpairsofresiduepositions,forexample,toaccountforhydrophobicpacking,saltbridges,orhydrogenbonding.Stillhigherordermodelshavebeenconsideredtorepresenttriplesandhighermultiplesofinteractingresiduepositions,butarelesscommon.

Approachesthatrepresentproteinstructureasalinearmodelconsidereachstructuralpositioninthemodelindependently,neglectingspatialinteractionsbetweenaminoacidsinthesequence.Thisallowsveryfastalignmentalgorithms,butloseswhateverstructuralinformationmaybepresentinaminoacidinteractions.Approachesthatusehigher-ordermodelsexplicitlyconsiderspatialinteractionsbetweenaminoacidsthataredistantinthesequencebutbroughtintocloseproximityinthemodel.Thispotentiallyallowsformorerealisticandinformativestructuralmodels,butresultsinanNP-completealignmentproblem

ADDINEN.CITE<EndNote><Cite><Author>Lathrop</Author><Year>1994</Year><RecNum>17</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>7831276</ACCESSION_NUMBER><VOLUME>7</VOLUME><NUMBER>9</NUMBER><YEAR>1994</YEAR><DATE>Sep</DATE><TITLE>TheproteinthreadingproblemwithsequenceaminoacidinteractionpreferencesisNP-complete</TITLE><PAGES>1059-68</PAGES><AUTHOR_ADDRESS>ArtificialIntelligenceLaboratory,MassachusettsInstituteofTechnology,Cambridge02139.</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Lathrop,R.H.</AUTHOR></AUTHORS><SECONDARY_TITLE>ProteinEng</SECONDARY_TITLE><KEYWORDS><KEYWORD>Algorithms</KEYWORD><KEYWORD>AminoAcidSequence</KEYWORD><KEYWORD>AminoAcids/chemistry</KEYWORD><KEYWORD>MolecularStructure</KEYWORD><KEYWORD>*ProteinEngineering/methods/statistics&numericaldata</KEYWORD><KEYWORD>*ProteinFolding</KEYWORD><KEYWORD>Proteins/*chemistry/genetics</KEYWORD><KEYWORD>Support,U.S.Gov't,Non-P.H.S.</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=7831276</URL></MDL></Cite></EndNote>

(Lathrop1994)

.Itisknownthattheinformationcontentinhigher-orderaminoacidinteractionsismodest,butnon-zero

ADDINEN.CITE<EndNote><Cite><Author>Cline</Author><Year>2002</Year><RecNum>32</RecNum><MDL><REFERENCE_TYPE>0</REFERENCE_TYPE><ACCESSION_NUMBER>12211011</ACCESSION_NUMBER><VOLUME>49</VOLUME><NUMBER>1</NUMBER><YEAR>2002</YEAR><DATE>Oct1</DATE><TITLE>Information-theoreticdissectionofpairwisecontactpotentials</TITLE><PAGES>7-14</PAGES><AUTHOR_ADDRESS>CenterforBiomolecularScienceandEngineering,BaskinSchoolofEngineering,UniversityofCalifornia,SantaCruz,California95064,USA.cline@</AUTHOR_ADDRESS><AUTHORS><AUTHOR>Cline,M.S.</AUTHOR><AUTHOR>Karplus,K.</AUTHOR><AUTHOR>Lathrop,R.H.</AUTHOR><AUTHOR>Smith,T.F.</AUTHOR><AUTHOR>Rogers,R.G.,Jr.</AUTHOR><AUTHOR>Haussler,D.</AUTHOR></AUTHORS><SECONDARY_TITLE>Proteins</SECONDARY_TITLE><KEYWORDS><KEYWORD>AminoAcids/*chemistry</KEYWORD><KEYWORD>Disulfides/chemistry</KEYWORD><KEYWORD>Electrostatics</KEYWORD><KEYWORD>Hydrophobicity</KEYWORD><KEYWORD>*Models,Biological</KEYWORD><KEYWORD>MolecularStructure</KEYWORD><KEYWORD>ProteinConformation</KEYWORD><KEYWORD>Proteins/*chemistry</KEYWORD><KEYWORD>Solvents/chemistry</KEYWORD><KEYWORD>Support,Non-U.S.Gov't</KEYWORD><KEYWORD>Support,U.S.Gov't,Non-P.H.S.</KEYWORD><KEYWORD>Support,U.S.Gov't,P.H.S.</KEYWORD></KEYWORDS><URL>/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12211011</URL></MDL></Cite></EndNote>

(Cline,Karplusetal.2002)

.Whateffectthishasinpractice,andwhethertheincreasedinformationcontentcompensatesfortheincreasedcomplexity,isasubjectofsomedebatewithintheproteinthreadingcommunity.

1Dmodelsoftheproteinstructure.

A1Dmodelofaproteinstructureisasequenceofstatesrepresentingtheresidueasifembeddedina3Dstructuralenvironment.Therearetwodistincttypesoffeaturesfrequentlyusedtocharacterizeastate,structuralfeaturesandaminoacidsequencefeatures.Thestructuralfeaturesincludethesolventexposureofagivenresidue,thesecondarystructureoftheresidue,andsoon.Thestructuralfeaturesmayberepresentationsofasinglespecificstructureor(weighted)averagesofstructuralfeaturesfrommultiplestructuresinthesamefamily.Thesequencefeaturesmayincludetheoriginalaminoacidsobservedinthestructureorasequenceprofilerepresentingthemultiplealignmentofsequencesfromtheproteinfamilyofthestructure’snativesequence.

Ifwedenotebysaresiduepositioninthestructure(orapositionfromthealignmentofmultiplestructures),thenavectoroffeaturesF(s)describeseachposition.Thusastructuremodelisanorderedchainoffeaturevectors{F(s)}.Thedimensionalityofthefeaturevectordependsonthespecificthreadingapproach.

Theoriginal1Dthreadingpapersrepresentedthefeaturevectorassolventexposurestates,wherethesolventexposurewascalculatedfromtheexposureofaminoacidspresentinthenativestru

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论