验证过敏原交叉反应性的生物信息学方法的验证外文文献Validation of bioinformatic approaches for predicting allergen cross reactivity_第1页
验证过敏原交叉反应性的生物信息学方法的验证外文文献Validation of bioinformatic approaches for predicting allergen cross reactivity_第2页
验证过敏原交叉反应性的生物信息学方法的验证外文文献Validation of bioinformatic approaches for predicting allergen cross reactivity_第3页
验证过敏原交叉反应性的生物信息学方法的验证外文文献Validation of bioinformatic approaches for predicting allergen cross reactivity_第4页
验证过敏原交叉反应性的生物信息学方法的验证外文文献Validation of bioinformatic approaches for predicting allergen cross reactivity_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Contents lists available at ScienceDirect Food and Chemical Toxicology journal homepage Validation of bioinformatic approaches for predicting allergen cross reactivity Rod A Herman Ping Song Corteva Agriscience 9330 Zionsville Road Indianapolis IN 46268 USA A R T I C L E I N F O Keywords Allergen Cross reactivity Bioinformatics Validation Sensitivity Selectivity A B S T R A C T Part of the allergenicity assessment of newly expressed proteins in genetically engineered food crops involves an assessment of potential cross reactivity with known allergens Bioinformatic approaches are used to evaluate the amino acid sequence identity or similarity between newly expressed proteins and the sequences of known al lergens To be useful such approaches must be sensitive to detecting cross reactive potential but also capable of excluding low risk sequences One diffi culty in comparing the eff ectiveness of diff erent bioinformatic ap proaches has been the lack of a standardized validation and evaluation method Here we propose a standardized method for evaluating the sensitivity of diff erent bioinformatic algorithms using a comprehensive database of known allergen sequences We combine this with a previously described method for evaluating selectivity using sequences from a crop not known to commonly cause food allergy e g maize to compare the standard 35 identity criterion over sliding window of 80 amino acids bioinformatic approach with the previously de scribed one to one 1 1 FASTA similarity approach using an E value threshold of 1E 9 Results confi rm the superiority of the 1 1 FASTA approach for selectively detecting cross reactive allergens The validation methods described here can be applied to other algorithms to select even better fi t for purpose approaches for evaluating cross reactive risk 1 Introduction One element of the weight of evidence assessment of newly ex pressed proteins in genetically engineered GE crops is a bioinformatic investigation for potential cross reactivity with known allergens Ladics et al 2011 Historically the algorithms developed and required by the regulatory agencies that oversee the safety of GE crops were not for mally validated as being fi t for purpose Ladics et al 2007 This may stem from the formulators of the initial criteria being experts in clinical allergy rather than in bioinformatics or formal method validation especially in relation to risk assessment The sensitivity of the bioin formatic methods was intended to be controlled based on identifi cation of disparate amino acid sequences among cross reactive allergens se lected through expert knowledge followed by identifi cation of the minimum amino acid identity between pairs of these sequences Goodman et al 2008 The most commonly used criterion developed in this manner is 35 identity over a sliding window of 80 amino acids using an alignment tool such as FASTA Codex alimentarius commission 2007 FAO WHO 2001 Such criteria can be useful if their selectivity for fi ltering out false positives is acceptable minimal false identifi cation of non cross reactive sequences Unfortunately the previously mentioned identity criterion over sliding window ap proach has poor selectivity and alternative criteria based on sequence similarity measures rather than identity have been found to be more selective and equally sensitive for detection of known cross reactive allergens Cressman and Ladics 2009 Herman et al 2015 Hileman et al 2002 Ladics et al 2007 Silvanovich et al 2009 Song et al 2014 A variety of suitable algorithms and tools based on advanced simi larity searches have been described but a common validation approach to identify the best fi t for purpose method has not been formalized Previous evaluations of sensitivity have mimicked the initial selection of disparate amino acid sequences from cross reactive allergens iden tifi ed based on expert knowledge followed by selection of similarity thresholds that favor detection Selectivity was then evaluated using a set of protein sequences from crop plants not known to commonly cause allergy e g maize Song et al 2014 Here we propose a complementary and standardized method for evaluating sensitivity using full length amino acid sequences contained in the COMPARE allergen database http comparedatabase org This approach makes use of a full suite of known allergen sequences as query proteins to examine how well a given criterion would have https doi org 10 1016 j fct 2019 110656 Received 29 March 2019 Received in revised form 24 June 2019 Accepted 1 July 2019 Corresponding author E mail addresses rod herman R A Herman ping song P Song Food and Chemical Toxicology 132 2019 110656 Available online 03 July 2019 0278 6915 2019 The Authors Published by Elsevier Ltd This is an open access article under the CC BY NC ND license http creativecommons org licenses BY NC ND 4 0 T detected each sequence if it was yet to be identifi ed as an allergen We used the previously described approach for evaluating selectivity based on querying an array of proteins from a crop not known to commonly cause allergy We exemplifi ed this validation approach by comparing a previously described one to one 1 1 FASTA approach with the com monly used regulatory approach based on 35 identity over an 80 amino acid sliding window Song et al 2014 2015 Note that these two bioinformatic methods have established preexisting thresholds of similarity and identity respectively and are used here to exemplify our proposed approach for comparing candidate methods for sensitivity 2 Methods and materials Bioinformatic approaches Identity over a sliding window of 80 amino acids was compared with a 1 1 FASTA approach using the amino acid sequences in the COMPARE 2018 database The COMPARE data base was initially constructed using sequences in the AllergenOnline database Goodman et al 2016 These two bioinformatic approaches have been previously described Codex alimentarius commission 2007 FAO WHO 2001 Song et al 2015 Song et al 2014 Briefl y the fi rst method parses each query protein into sliding windows of 80 amino acids each of which is then aligned with known allergen sequences followed by identifi cation of matches with 35 identity An adjust ment was made for alignments under 80 amino acids where the number of identical amino acid matches was divided by 80 to calculate percent identity over 80 amino acids Song et al 2014 The second method uses the FASTA algorithm to search for local alignments between the query protein and each allergen placed singly into a database ensuring that the signifi cance of the similarity E value does not vary as the database size changes over time when sequences are added or removed from the allergen database not controlled using conventional FASTA approach It is noteworthy that the 1 1 FASTA approach is not equivalent to setting the database size to a fi xed value because the 1 1 FASTA approach has a database size that varies with the length of the single sequence in the database during each query Furthermore the statistical methods used to generate the E value are diff erent compared with those typically used on a full database Pearson 2016 The pre viously proposed threshold E value of 1E 9 was used to indicate cross reactive potential Sensitivity Full length amino acid sequences in the COMPARE allergen database were putatively identifi ed by searching the defi ni tion fi eld of each entry GenBank format within the database for the word partial and eliminating these and also eliminating additional sequences of35 identity over 80 amino acids and also likely not to be full length sequences from the query sequence pool but not the searched database The current version of the COMPARE database does not consistently iden tify sequences as full length or partial Only putative full length se quences were selected as the query set because this mirrors the situation for proteins expressed in GE crops which all have the complete se quences known These full length sequences were used singly to query the sequences in the COMPARE database and the best match was identifi ed excluding the identical entry in the database equivalent to removing the identical entry in the database before conducting the query Fig 1 Note that identical sequences from diff erent source organisms were not removed from the database simulating a situation where the query sequence was newly identifi ed from a previously un known source organism Diff erent best match protein pairs were then compared with one another to fi nd those pairs with matches not meeting the threshold of 35 for the identity criterion over sliding window approach or an E value 1E 9 from the 1 1 FASTA comparison Fig 2 Both methods missed the same 42 sequences suggesting their uniqueness in the database Tables 1 and 2 3 2 Data cleansing and cross reactivity Data cleansing Data cleansing or scrubbing is the process of correcting datasets A subset of sequences was selected in an automated manner removal of those tagged partial and those 35 identity sliding window criteria These two source organisms are reported to show cross reactivity Gaig et al 1999 and the query sequence in the COMPARE database appears to be full length 63 amino acids long Tuppo et al 2013 However the subject sequence from pomegranate is only 20 amino acids long and represents approximately 30 of the putative full length protein Tuppo et al 2017 In addition query accession P82946 2 from orchard grass and subject accession cad54671 2 from timothy grass did not meet the 1 1 FASTA threshold E value 1 50E 7 or satisfy the sliding window criteria and their source organisms have known cross reactivity Chakrabarty et al 1981 However the query protein is only 55 amino acids long shortest of the 52 query proteins missed by the 1 1 FASTA approach while the subject protein is 508 amino acids long Upon investigation it was found that the 55 amino acid orchard grass sequence was partial representing approxi mately 10 of the full length sequence and thus is not representative of newly expressed proteins in GE crops Leduc Brodard et al 1996 Four other query proteins returned best match subjects from the same source organism as the query protein accessions BAV90601 1 from the dust mite Dermatophagoides farinae AGL34967 1 from coff ee Coff ea arabica NP 776 953 1 from cow s milk Bos taurus and P06886 1 from the bacteria Staphylococcus aureus which precludes an analysis of source organism cross reactivity For the 36 remaining protein pairs detected by neither approach no literature documenting cross reactivity be tween the source organisms was identifi ed Overall sensitivity Both bioinformatic approaches performed si milarly in terms of sensitivity and neither uniquely identifi ed known cross reactive allergens Both methods appeared to detect any relevant amino acid homology that might confer allergenic cross reactivity 3 3 Selectivity The selectivity of the sliding window and 1 1 FASTA bioinformatic approaches were compared using the in silico translated gene sequences for maize as query proteins since maize is a rarely allergenic crop 58 286 sequences However the known allergen amino acid se quences were fi rst removed from the COMPARE allergen database and sequences tagged with several text terms related to these sequences Fig 2 Results comparing diff erent bioinformatic approaches for detection of potential allergen cross reactivity Sequences in COMPARE allergen database were used to evaluate sensitivity and maize protein sequences were used to evaluate selectivity Sections in Venn diagram are not proportional to the number of sequences in each section R A Herman and P SongFood and Chemical Toxicology 132 2019 110656 3 Table 1 Protein sequences missed by one bioinformatic approach as a cross reactive risk Query1 1 FASTA SubjectAlignmentSliding window alignment and subject 1 1 FASTA accessionlengthspeciescommon nameaccessionlengthspeciescommon nameE valueoverlap identityaccessionlengthspeciescommon name detected by 1 1 FASTA only P16312 130PDermatophagoides microceras dust miteABA39436 1276Dermatophagoides farinae dust mite6 50E 1930 35 CAA26038 170Apis melliferahoney beeP01502 126Apis dorsatagiant honey bee 2 00E 1426 35 CCW27997 170Hevea brasiliensisrubber tree latex P82977 284Triticum aestivumwheat9 90E 1463 35 AHF71027 1237Betula pendulabirchACE82289 1222Triticum aestivumwheat9 70E 13209 35 P33556 138PVitis sp grapeP80274 137Vitis sp grape2 50E 1237 35 BAG93480 1476Oryza sativaAsian riceAAA32708 1499Aspergillus oryzaefungus4 30E 12370 35 P80274 137PVitis sp grapeP33556 138Vitis sp grape5 00E 1237 35 P81216 129PEquus caballushorseP81217 119Equus caballushorse2 30E 1018 35 CAK50389 1115Anisakis simplexhuman parasitic nematode AAR92223 1116Actinidia deliciokiwi8 10E 1087 35 P85524 1150Actinidia deliciosakiwiABZ81045 1159Quercus albawhite oak1 00E 09143 35 AAR92223 1116Actinidia deliciosakiwiCAK50389 1115Anisakis simplex parasitic fi sh worm 3 50E 0987 35 detected by sliding window only AAP06493 1129Schistosoma japonicumhuman blood fl uke CAA75506 1133Helianthus annuus sunfl ower1 20E 0813435 30AIO08866 1130Dermatophagoides farinae dust mite ABA42918 1274Cladosporium herbarumfungusAAB26195 168Ascaris suumpig roundwork8 00E 042235 70P56166294Phalaris aquaticacanary grass AAN73248 1450Fusarium culmorumfungusAAA28303 1203PDolichovespula arenariawasp9 80E 049535 80CAA11266 1302Fusarium culmorumfungus XP 003 030 591 1576Schizophyllum communemushroomBAF45320 165Cryptomeria japonicaJapanese cedar 7 30E 042036 60AAC25998 182Phleum pratensetimothy grass BAI94503 1165Cryptomeria japonicaJapanese cedarABX56711 1116Arachis hypogaeapeanut1 60E 0711937 00ABX56711 1116Arachis hypogaeapeanut CAA55854 1205Betula pendulabirchBAA09634 179Brassica rapabrassica3 10E 085738 00AAX77686 1160Ambrosia artemisiifolia ragweed BAA06905 1731Cucumis melomuskmellonP29600 1269Bacillus lentusbacteria5 60E 0715941 20ADE74975 1403Aspergillus versicolorfungus NP 776 945 11364Bos tauruscattle beef AAX77383 1510Sinapis albabrassica1 10E 057547 50AKF12278 1156Parthenium hysterophorus aster AAC49447 1151Hevea brasiliensisrubber tree latex BAB15802 1517Glycine maxsoybean4 40E 049347 55AAN73248 1177Manihot esculentacassava P81729 191Brassica rapabrassica1WKX A43Hevea brasiliensisrubber tree latex 3 00E 083457 60CAA05978 1187Hevea brasiliensisrubber tree latex Ppartial sequence Bolded species entries do not exclude probable cross reactivity Note that sliding window software does not report matches of 35 identity R A Herman and P SongFood and Chemical Toxicology 132 2019 110656 4 Table 2 Protein sequences not detected by either bioinformatic approach as a cross reactive risk Query1 1 FASTA SubjectAlignmentSliding window 1 1 FASTA accessionlengthspeciescommon nameaccessionlengthspeciescommon nameE valueoverlap identity Q6R4B4 1231Alternaria alternatafungusAAX33729 1216Periplaneta americanacockroach1 50E 08119 35 BAG88472 1221Oryza sativaAsian riceAAL73404 1515Corylus avellanahazelnut2 10E 08138 35 P13080 1579Aedes aegyptimosquitoAAD38942 1496PDermatophagoides pteronyssinusdust mite2 30E 08225 35 L7UZ85 1885Dermatophagoides farinaedust miteAAF31151 1171Olea europaeaolive3 20E 08160 35 AAB22817 1273Arachis hypogaeapeanutAK068307 1764Oryza sativaAsian rice8 10E 08276 35 P86888 163Prunus persicapeachC0HKC0 120PPunica granatumpomegranate1 30E 0720 35 AAL49391 198Felis catushouse catCAK50389 1115Anisakis simplexhuman parasitic nematode1 50E 0760 35 P82946 155PDactylis glomerataorchard grasscad54671 2508Phleum pratensetimothy grass1 50E 0720 35 AAF07903 2169Triatoma protractakissing bugACF53837 1190Blattella germanicacockroach3 40E 07171 35 CAD56944 11770Apis melliferahoney beevitellogeninM284Gallus gallusred junglefowl5 00E 07323 35 AAC67308 1191Schistosoma japonicum human blood fl ukeAAT45383 1109Lates calcariferseabass1 10E 0658 35 P81943 386Apium graveolensceleryCAH92637 1423Lolium perenneperennial ryegrass1 70E 0635 35 BAJ04354 1472Cryptomeria japonicaJapanese cedarP00791 3385Sus scrofapig pepsin 1 80E 06362 35 ADK47876 1126Thaumetopoea pityocampamothP02224 2162Chironomus thummi thummimidge6 40E 06129 35 P24337 180Glycine maxsoybeanACE07189 1117Artemisia vulgarismugwort3 40E 0579 35 ACD65081 1325Forcipomyia taiwanamidgeP14947 197Lolium perenneperennial ryegrass4 60E 0531 35 P06886 1234Staphylococcus aureusbacteriaP20723 1258Staphylococcus aureusbacteria5 80E 05186 35 Q28050 1101Bos taurus cattle amniotic fl uid ADD19989 1222Glossina morsitans morsitans tsetse fl y5 90E 0551 35 AGL34968 165 Coff ea arabica Arabian coff eeCCW27997 170Hevea brasiliensisrubber tree latex 8 00E 0543 35 AAR17475 1228Penicillium citrinumPenicillium fungusAAT95010 1227Polistes dominulawasp8 20E 05131 35 ABI26088 1169Alternaria alternataAlternaria fungusP80207 1129Brassica junceabrassica9 30E 0519 35 AAK67492 1108Curvularia lunatafungiAAC48795 1180Canis lupus familiarisdog1 00E 0459 35 AKJ77985 189Triticum aestivumwheatAHF71027 1237Betula pendulaEuropean white birch1 40E 0423 35 CAM54066 1185Aspergillus fumigatusfungusP86745 1108Merluccius australis australis southern hake fi sh 1 50E 0498 35 CAA57342 1350Candida albicansyeastCAA52194 1607Equus caballushorse1 70E 04240 35 NP 776 953 1222Bos tauruscattle milk AAA30429 1214Bos tauruscattle milk 2 10E 04158 35 CAA65313 1137Triticum aestivumwheatAAT37679 1342PRhodotorula mucilaginosayeast2 30E 0482 35 ABB89950 1733Penicillium citrinumfungusP81729 191Brassica rapabrassica2 80E 0449 35 NP 001 036 878 1227Bombyx morisilkwormP49148 1110Alternaria alternatafungus2 80E 0456 35 AGL34967 180 Coff ea arabica Arabian coff eeAGL34968 165 Coff ea arabica Arabian coff ee2 90E 0479 35 P18153 2321Aedes aegyptimosquito saliva ABX26138 1152Olea europaeaolive3 80E 0425 35 AAN11300 1236Candida albicansyeastAAW29810 1507Juglans regiaEnglish walnut4 50E 04147 35 P00304 2101Ambrosia artemisiifoliaragweedP84296 1161Chironomus thummi thummimigde5 10E 0442 35 CAA09886 2179Malassezia sympodialisMalasseziaP02221 2158Chironomu

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论