个人整理的Qiime命令用法大全.doc_第1页
个人整理的Qiime命令用法大全.doc_第2页
个人整理的Qiime命令用法大全.doc_第3页
个人整理的Qiime命令用法大全.doc_第4页
个人整理的Qiime命令用法大全.doc_第5页
已阅读5页,还剩216页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

个人整理的QIIME脚本命令用法大全By peterrjpadd_alpha_to_mapping_file.py Add alpha diversity data to a metadata mapping fileDescription:Add alpha diversity data to a mapping file for use with other QIIME scripts, i. e.make_3d_plots.py. The resulting mapping file will contain three new columns per metric in the alpha diversity data; the first column being the raw value, the second being a normalized raw value and the third one a label classifying the bin where this value fits based on the normalized value.Usage:add_alpha_to_mapping_file.pyoptionsInput Arguments:REQUIRED-i,-alpha_fpsAlpha diversity data with one or multiple metrics i. e. the output ofalpha_diversity.py. This can also be a comma-separated list of collated alpha diversity file paths i. e. the output ofcollate_alpha.py, when using collated alpha diversity data the depth option is required-m,-mapping_fpMapping file to modify by adding the alpha diversity dataOPTIONAL-o,-output_mapping_fpFilepath for the modified mapping file default: mapping_file_with_alpha.txt-b,-number_of_binsNumber of bins default: 4.-x,-missing_value_nameBin prefix name for the sample identifiers that exist in the mapping file (mapping_fp) but not in the alpha diversity file (alpha_fp) default: N/A.-binning_methodSelect the method name to create the bins, the options are equal and quantile. Both methods work over the normalized alpha diversity values. On the one hand equal will assign the bins on equally spaced limits, depending on the value of number_of_bins i. e. if you select 4 the limits will be 0.25, 0.50, 0.75. On the other hand quantile will select the limits based on the number_of_bins i. e. the limits will be the quartiles if 4 is selected default: equal.-depthSelect the rarefaction depth to use when the alpha_fps refers to collated alpha diversity file(s) i. e. the output ofcollate_alpha.py. All the iterations contained at this depth will be averaged to form a single mean value default: highest depth available.-collated_inputUse to specify that the -i option is composed of collated alpha diversity data.Output:The result of running this script is a metadata mapping file that will include 3 new columns per alpha diversity metric included in the alpha diversity file. For example, with an alpha diversity file with only PD_whole_tree, the new columns will PD_whole_tree_alpha, PD_whole_tree_normalized and PD_whole_tree_bin.Adding alpha diversity data:Add the alpha diversity values to a mapping file and classify the normalized values into 4 bins, where the limits will be 0 x = 0.25 for the first bin 0.25 x = 0.5 for the second bin, 0.5 x = 0.75 for the third bin and 0.75 x FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGAand in the output combined fasta file would be written like this Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGANo changes are made to the sequences.Usage:add_qiime_labels.pyoptionsInput Arguments:REQUIRED-m,-mapping_fpSampleID to fasta file name mapping file filepath-i,-fasta_dirDirectory of fasta files to combine and label.-c,-filename_columnSpecify column used in metadata mapping file for fasta file names.OPTIONAL-o,-output_dirRequired output directory for log file and corrected mapping file, log file, and html file. default: .-n,-count_startSpecify the number to start enumerating sequence labels with. default: 0Output:A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file.Example:Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fastaadd_qiime_labels.py -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fastaadjust_seq_orientation.py Get the reverse complement of all sequencesDescription:Write the reverse complement of all seqs in seqs.fasta (-i) to seqs_rc.fasta (default, change output_fp with -o). Each sequence description line will have RC appended to the end of it (default, leave sequence description lines untouched by passing -r):Usage:adjust_seq_orientation.pyoptionsInput Arguments:REQUIRED-i,-input_fasta_fpPath to the input fasta fileOPTIONAL-o,-output_fpThe output filepath-r,-retain_seq_idLeave seq description lines untouched default: append ” RC” to seq description linesOutput:Example:Reverse complement all sequences in seqs.fna and write result to seqs_rc.fnaadjust_seq_orientation.py -i seqs.fnaalign_seqs.py Align sequences using a variety of alignment methodsDescription:This script aligns the sequences in a FASTA file to each other or to a template sequence alignment, depending on the method chosen. Currently, there are three methods which can be used by the user:1. PyNAST (Caporaso et al., 2009) - The default alignment method is PyNAST, a python implementation of the NAST alignment algorithm. The NAST algorithm aligns each provided sequence (the “candidate” sequence) to the best-matching sequence in a pre-aligned database of sequences (the “template” sequence). Candidate sequences are not permitted to introduce new gap characters into the template database, so the algorithm introduces local mis-alignments to preserve the existing template sequence.2. MUSCLE (Edgar, 2004) - MUSCLE is an alignment method which stands for MUltiple Sequence Comparison by Log-Expectation.3. INFERNAL (Nawrocki, Kolbe, & Eddy, 2009) - Infernal (“INFERence of RNA ALignment”) is for an alignment method for using RNA structure and sequence similarities.Usage:align_seqs.pyoptionsInput Arguments:REQUIRED-i,-input_fasta_fpPath to the input fasta fileOPTIONAL-m,-alignment_methodMethod for aligning sequences. Valid choices are: pynast, infernal, clustalw, muscle, infernal, mafft default: pynast-a,-pairwise_alignment_methodMethod for performing pairwise alignment in PyNAST. Valid choices are muscle, pair_hmm, clustal, blast, uclust, mafft default: uclust-t,-template_fpFilepath for template against default: /Users/caporaso/data/greengenes_core_sets/core_set_aligned_imputed.fasta_11_8_07.no_dots-e,-min_lengthMinimum sequence length to include in alignment default: 75% of the median input sequence length-p,-min_percent_idMinimum percent sequence identity to closest blast hit to include sequence in alignment default: 0.75-d,-blast_dbDatabase to blast against when -m pynast default: created on-the-fly from template_alignment-muscle_max_memoryMaximum memory allocation for the muscle alignment method (MB) default: 80% of available memory, as detected by MUSCLE-o,-output_dirPath to store result file default: _alignedOutput:All aligners will output a fasta file containing the alignment and log file in the directory specified by-output_dir (default _aligned). PyNAST additionally outputs a failures file, containing the sequences which failed to align. So the result of align_seqs.py will be up to three files, where the prefix of each file depends on the user supplied FASTA file:1. ”._aligned.fasta” - This is a FASTA file containing all aligned sequences.2. ”._failures.fasta” - This is a FASTA file containing all sequences which did not meet all the criteria specified. (PyNAST only)3. ”._log.txt” - This is a log file containing information pertaining to the results obtained from a particular method (e.g. BLAST percent identity, etc.).Alignment with PyNAST:The default alignment method is PyNAST, a python implementation of the NAST alignment algorithm. The NAST algorithm aligns each provided sequence (the “candidate” sequence) to the best-matching sequence in a pre-aligned database of sequences (the “template” sequence). Candidate sequences are not permitted to introduce new gap characters into the template database, so the algorithm introduces local mis-alignments to preserve the existing template sequence. The quality thresholds are the minimum requirements for matching between a candidate sequence and a template sequence. The set of matching template sequences will be searched for a match that meets these requirements, with preference given to the sequence length. By default, the minimum sequence length is 150 and the minimum percent id is 75%. The minimum sequence length is much too long for typical pyrosequencing reads, but was chosen for compatibility with the original NAST tool.The following command can be used for aligning sequences using the PyNAST method, where we supply the program with a FASTA file of unaligned sequences (i.e. resulting FASTA file frompick_rep_set.py, a FASTA file of pre-aligned sequences (this is the template file, which is typically the Greengenes core set - available from/), and the results will be written to the directory “pynast_aligned/”:align_seqs.py -i $PWD/unaligned.fna -t $PWD/core_set_aligned.fasta.imputed -o $PWD/pynast_aligned_defaults/Alternatively, one could change the minimum sequence length (“-e”) requirement and minimum sequence identity (“-p”), using the following command:align_seqs.py -i $PWD/unaligned.fna -t core_set_aligned.fasta.imputed -o $PWD/pynast_aligned/ -e 500 -p 95.0Alignment with MUSCLE:One could also use the MUSCLE algorithm. The following command can be used to align sequences (i.e. the resulting FASTA file frompick_rep_set.py), where the output is written to the directory “muscle_alignment/”:align_seqs.py -i $PWD/unaligned.fna -m muscle -o $PWD/muscle_alignment/Alignment with Infernal:An alternative alignment method is to use Infernal. Infernal is similar to the PyNAST method, in that you supply a template alignment, although Infernal has several distinct differences. Infernal takes a multiple sequence alignment with a corresponding secondary structure annotation. This input file must be in Stockholm alignment format. There is a fairly good description of the Stockholm format rules at:/wiki/Stockholm_format. Infernal will use the sequence and secondary structural information to align the candidate sequences to the full reference alignment. Similar to PyNAST, Infernal will not allow for gaps to be inserted into the reference alignment. Using Infernal is slower than other methods, and therefore is best used with sequences that do not align well using PyNAST.The following command can be used for aligning sequences using the Infernal method, where we supply the program with a FASTA file of unaligned sequences, a STOCKHOLM file of pre-aligned sequences and secondary structure (this is the template file - an example file can be obtained from:/QIIME/seed.16s.reference_model.sto.zip), and the results will be written to the directory “infernal_aligned/”:align_seqs.py -m infernal -i $PWD/unaligned.fna -t $PWD/seed.16s.reference_model.sto -o $PWD/infernal_aligned/alpha_diversity.py Calculate alpha diversity on each sample in an otu table, using a variety of alpha diversity metricsDescription:This script calculates alpha diversity, or within-sample diversity, using an otu table. The QIIME pipeline allows users to conveniently calculate more than two dozen different diversity metrics. The full list of available metrics is available by passing the option -s to the scriptalpha_diversity.py, and documentation of those metrics can be found at/scripts/alpha_diversity_metrics.html. Every metric has different strengths and limitations - technical discussion of each metric is readily available online and in ecology textbooks, but is beyond the scope of this document.Usage:alpha_diversity.pyoptionsInput Arguments:OPTIONAL-i,-input_pathInput OTU table filepath or input directory containing OTU tables for batch processing. default: None-o,-output_pathOutput distance matrix filepath or output directory to store distance matrices when batch processing. default: None-m,-metricsAlpha-diversity metric(s) to use. A comma-separated list should be provided when multiple metrics are specified. default: PD_whole_tree,chao1,observed_species-s,-show_metricsShow the available alpha-diversity metrics and exit.-t,-tree_pathInput newick tree filepath. default: None; REQUIRED for phylogenetic metricsOutput:The resulting file(s) is a tab-delimited text file, where the columns correspond to alpha diversity metrics and the rows correspond to samples and their calculated diversity measurements. When a folder is given as input (-i), the script processes every otu table file in the given folder, and creates a corresponding file in the output directory.Example Output:simpsonPD_whole_treeobserved_speciesPC.3540.9252.8373916.0PC.3550.9153.0660914.0PC.3560.9453.1048919.0PC.4810.9453.6569519.0PC.5930.913.377615.0PC.6070.924.1339716.0PC.6340.93.7136914.0PC.6350.944.2023918.0PC.6360.9253.7888216.0Single File Alpha Diversity Example (non-phylogenetic):To perform alpha diversity (e.g. chao1) on a single OTU table, where the results are output to “alpha_div.txt”, you can use the following command:alpha_diversity.py -i otu_table.biom -m chao1 -o adiv_chao1.txtSingle File Alpha Diversity Example (phylogenetic):In the case that you would like to perform alpha diversity using a phylogenetic metric (e.g. PD_whole_tree), you can use the following command:alpha_diversity.py -i otu_table.biom -m PD_whole_tree -o adiv_pd.txt -t rep_set.treSingle File Alpha Diversity Example with multiple metrics:You can use the following idiom to run multiple metrics at once (comma-separated):alpha_diversity.py -i otu_table.biom -m chao1,PD_whole_tree -o adiv_chao1_pd.txt -t rep_set.treMultiple File (batch) Alpha Diversity:To perform alpha diversity on multiple OTU tables (e.g.: rarefied otu tables resulting frommultiple_rarefactions.py), specify an input directory instead of a single otu table, and an output directory (e.g. “alpha_div_chao1_PD/”) as shown by the following command:alpha_diversity.py -i otu_tables/ -m chao1,PD_whole_tree -o adiv_chao1_pd/ -t rep_set.trealpha_diversity_metrics List of available metricsNon-phylogeny based metrics: berger_parker_d brillouin_d chao1 chao1_confidence dominance doubles (# otus with exactly two individuals in sample) equitability fisher_alpha gini index goods coverage heip_e (note, using heip_e at low (5) individuals may cause errors kempton_taylor_q margalef mcintosh_d mcintosh_e menhinick michaelis_menten_fit observed_species osd (observed # otus, singleton OTUs, doubleton OTUs) robbins shannon (base 2 is used in the logarithms) simpson (1 - Dominance) simpson_reciprocal (1 / Dominance) simpson_e singles (# OTUs with exactly one individual present in sample) strongPhylogeny based metrics: PD_whole_treealpha_rarefaction.py A workflow script for performing alpha rarefactionDescription:The steps performed by this script are: Generate rarefied OTU tables; compute alpha diversity metrics for each rarefied OTU table; collate alpha diversity results; and generate alpha rarefaction plots.Usage:alpha_rarefaction.pyoptionsInput Arguments:REQUIRED-i,-otu_table_fpThe input otu table REQUIRED-m,-mapping_fpPath to the mapping file REQUIRED-o,-output_dirThe output directory REQUIREDOPTIONAL-p,-parameter_fpPath to the parameter file, which specifies changes to the default behavior. See/documentation/file_formats.html#qii

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论