版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、ChIP-seq analysis with MACS2Tips and tricks,Sami Heikkinen, PhD Docent in Molecular Bioinformatics Institute of Biomedicine, UEF,Schmidt et al, Methods, 2009,ChIP-Seq simplified,Where?,Park, Nat Rev Genetics, 2009,From binding to binding sites,Typically millions of reads per sample,Park, Nat Rev Gen
2、etics, 2009,ChIP-seq,200 bp,36-50 bp,Control sample: “Input” or “IgG” Input: sonicated chromatin without immunoprecipitation IgG: “unspecific” IP,MACS2,Model-based Analysis of ChIP-Seq Original version published by Yong Zhang and Tao Liu from the lab of X. Shirley Liu at the Dana-Farber Cancer Insti
3、tute, Boston Genome Biology 2008, 9:R137 now at version 2.1.0.20140616, developed and maintained by Tao Liu at Package of command line programs to call peaks in ChIP-seq data Much improved since v1.x!,diffpeak,bdgdiff,bdgcmp,bdgbroadcall,MACS2 program(s),peaks.narrowPeak,callpeak,summits.bed,peaks.x
4、ls,model.r,model.pdf,INPUT DATA: aligned sequence reads,OUTPUT FILEs,treat_pileup.bdg,control_lambda.bdg,refinepeaks,refinepeak.bed,randsample,filterdup,predictd,pileup,pileup.bdg,bdgpeakcall,OUTPUT,callpeak - Options,Various options to indicate/control input, output, peak modelling and peak calling
5、 macs2 callpeak usage: macs2 callpeak -h -t TFILE TFILE . -c CFILE CFILE . -f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE -g GSIZE -keep-dup KEEPDUPLICATES -buffer-size BUFFER_SIZE -outdir OUTDIR -n NAME -B -verbose VERBOSE -trackline -SPMR -s TSIZE -bw BW -m MFOLD MFOLD -fix-bimodal
6、 -nomodel -shift SHIFT -extsize EXTSIZE -q QVALUE -p PVALUE -to-large -ratio RATIO -down-sample -seed SEED -nolambda -slocal SMALLLOCAL -llocal LARGELOCAL -broad -broad-cutoff BROADCUTOFF -call-summits -t/-treatment FILENAME This is the only REQUIRED parameter for MACS.,Using MACS connect to server,
7、Open the SSH client at Win All programs SSH Secure shell Secure shell client “Quick connect” connection : intron.uef.fi username : password: ,Unix 101,pwd show Present Working Directory cd Change Directory e.g. cd /home/work/public to get to the folder we use today (from wherever you are) or, to get
8、 back to your home directory: cd $HOME or, back one step cd ., or two steps cd ././ Usage tip: use up/down arrow keys to move in command history ls LiSt files in directory e.g. ls -l to show file and folder names AND other info (Long format) head / tail show first/last lines of a (text) file e.g. he
9、ad -20 ref_hg19.txt Usage tip: use the TAB key to fill in available file/folder names,Using MACS - setup,cd /home/work/public mkdir macsout_ : e.g. spheikki for me each student MUST have their own folder! to avoid overlapping MACS outputs checks on seq files ls l seq head seq/* check that macs2 work
10、s macs2 callpeak,callpeak - Options,Various options to indicate/control input, output, peak modelling and peak calling macs2 callpeak usage: macs2 callpeak -h -t TFILE TFILE . -c CFILE CFILE . -f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE -g GSIZE -keep-dup KEEPDUPLICATES -buffer-si
11、ze BUFFER_SIZE -outdir OUTDIR -n NAME -B -verbose VERBOSE -trackline -SPMR -s TSIZE -bw BW -m MFOLD MFOLD -fix-bimodal -nomodel -shift SHIFT -extsize EXTSIZE -q QVALUE -p PVALUE -to-large -ratio RATIO -down-sample -seed SEED -nolambda -slocal SMALLLOCAL -llocal LARGELOCAL -broad -broad-cutoff BROADC
12、UTOFF -call-summits,callpeak Options - Input,Input files arguments: -t TFILE TFILE ., -treatment TFILE TFILE . ChIP-seq treatment file. If multiple files are given as -t A B C, then they will all be read and combined. REQUIRED. -c CFILE CFILE ., -control CFILE CFILE . Control file. If multiple files
13、 are given as -c A B C, then they will all be read and combined. -f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE, -format AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE Format of tag file, AUTO, BED or ELAND or ELANDMULTI or ELANDEXPORT or SAM or BAM or BOWTIE or BAMPE. The
14、 default AUTO option will let MACS decide which format the file is. Please check the definition in README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE. DEFAULT: AUTO -g GSIZE, -gsize GSIZE Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:hs for human (2.7e9), mm
15、for mouse (1.87e9), ce for C. elegans (9e7) and dm for fruitfly (1.2e8), Default:hs -keep-dup KEEPDUPLICATES It controls the MACS behavior towards duplicate tags at the exact same location - the same coordination and the same strand. The auto option makes MACS calculate the maximum tags at the exact
16、 same location based on binomal distribution using 1e-5 as pvalue cutoff; and the all option keeps every tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1 -buffer-size BUFFER_SIZE Buffer size f
17、or incrementally increasing internal array size to store reads alignment information. In most cases, you dont have to change this parameter. However, if there are large number of chromosomes/contigs/scaffolds in your alignment, its recommended to specify a smaller buffer size in order to decrease me
18、mory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT: 100000,callpeak Options - Output,Output arguments: -outdir OUTDIR If specified all output files will be written to that
19、 directory. Default: the present working directory -n NAME, -name NAME Experiment name, which will be used to generate output file names. DEFAULT: NA -B, -bdg Whether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: False -verbos
20、e VERBOSE Set verbose level of runtime message. 0: only show critical message, 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT:2 -trackline Tells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCS
21、C genome browser, can show name and description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig file at UCSC genome browser, as well as downstream analysis. Require -B to be set. Default: Not include trackline. -SPMR If True
22、, MACS will save signal per million reads for fragment pileup profiles. Require -B to be set. Default: False,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits macs2 callpeak -t seq/treat_chr3.sam -c seq/inpu
23、t_chr3.sam -outdir macsout_ -n defaults,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -n defaults ls l macsout_ head -40 macsou
24、t_/*,callpeak Options Peak calling 1,Peak calling arguments 2: -nolambda If True (=set), MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure. -slocal SMALLLOCAL Th
25、e small nearby region in basepairs to calculate dynamic lambda. This is used to capture the bias near the peak summit region. Invalid if there is no control data. If you set this to 0, MACS will skip slocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. T
26、he final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 1000 -llocal LARGELOCAL The large nearby region in basepairs to calculate dynamic lambda. This is used to capture the surround bias. If you set this to 0, MACS will skip llocal lambda calc
27、ulation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 10000. -broad If set, MACS will try to call broad peaks by linking nearby highly enriched regions. The lin
28、king region is controlled by another cutoff through -linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False -broad-cutoff BROADCUTOFF Cutoff for broad region. This option is not available unless -broad is set. If -p is set, this is a pvalue cutoff, otherwise, its
29、 a qvalue cutoff. DEFAULT: 0.1 -call-summits If set, MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region. DEFAULT: False,-call-summits,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust mod
30、el band width Run 4: Adjust mfold limits From command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -call-summits -n cs.defaults,callpeak Options Peak calling 2,Peak calling arguments 1: -q QVALUE, -qvalue
31、 QVALUE Minimum FDR (q-value) cutoff for peak detection. DEFAULT: 0.05. -q, and -p are mutually exclusive. -p PVALUE, -pvalue PVALUE Pvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1 in the
32、 final .xls file. -to-large When set, scale the small sample up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead to smaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more.
33、DEFAULT: False -ratio RATIO When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling. DEFAULT: ingore -down-sample When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make your
34、 result unstable and irreproducible since each time, random reads would be selected. Consider to use randsample script instead. If used together with SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as you expecte
35、d! DEFAULT: False -seed SEED Set the random seed while down sampling data. Must be a non-negative integer in order to be effective. DEFAULT: not set,callpeak Options The Model,Shifting model arguments: -s TSIZE, -tsize TSIZE Tag size (=read length). This will overide the auto detected tag size. DEFA
36、ULT: Not set -bw BW Band width for picking regions to compute fragment size. This value is only used while building the shifting model. DEFAULT: 300 -m MFOLD MFOLD, -mfold MFOLD MFOLD Select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. Fold-en
37、richment in regions must be lower than upper limit, and higher than the lower limit. Use as -m 10 30. DEFAULT:5 50 -fix-bimodal Whether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the -exsize parameter to extend each tags tow
38、ards 3 direction. Not to use this automate fixation is a default behavior now. DEFAULT: False -nomodel Whether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it. DEFAULT: False -shift SHIFT (NOT the legacy
39、 -shiftsize option!) The arbitrary shift in bp. Use discretion while setting it other than default value. When NOMODEL is set, MACS will use this value to move cutting ends (5) towards 5-3 direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will be moved towar
40、d 3-5 direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you cant set values other than 0 if format is BAMPE for paired-end data. DEFAULT: 0. -extsiz
41、e EXTSIZE The arbitrary extension size in bp. When nomodel is true, MACS will use this value as fragment size to extend each read towards 3 end, then pile them up. Its exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5-3 direction to middle of fragment by 1/2 d, then extended to
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年高职农业机械维修(农机维修技术)试题及答案
- 2026年巧克力机维修(巧克力机调试技术)试题及答案
- 2025年大学道路运输(道路运输法规)试题及答案
- 2025年高职城乡规划管理(规划管理)试题及答案
- 2025年大学大二(会展设计)会展空间设计布置创意综合测试题及答案
- 2026年办公设备销售(客户接待)试题及答案
- 2025年高职园艺(园艺应用能力)试题及答案
- 2026年集成电路制造设备项目可行性研究报告
- 2025年高职造型艺术(绘画基础技法)试题及答案
- 2025年高职尺寸公差控制(零件精度保障)试题及答案
- 2025年苏州市事业单位招聘考试教师招聘体育学科专业知识试卷(秋季卷)
- 2025年村干部考公务员试题及答案笔试
- 2025年《国际贸易学》期末试题以及答案
- 老年照护初级理论知识考试试题库及答案
- 报警信息管理办法
- 2025年上海考警面试题目及答案
- 沥青混凝土供货方案及保障措施
- 主数据mdm管理办法
- 医院智慧管理分级评估标准体系(试行)-全文及附表
- DB14∕T 3327-2025 高速公路路基路面探地雷达检测技术规程
- 《完整的PMC部作业流程体系》
评论
0/150
提交评论