| Version 2 (modified by , 15 years ago) (diff) | 
|---|
Workflow 3: sample level variant calling
Table of Contents
This workflow will call variants for the samples including:
- sample level recalibration
- sample level realignment
N.B. no sample level MarkDuplicates? is needed as lanes = libraries.
Workflow inputs:
- lane.chr.recal.sorted.bam - for all sample lanes: dedupped, recalibrated, realigned, sorted and indexed bams (3)
- sample.chip.vcf - genotypes called from genotype chip
Reference:
- genome.chr.fasta - reference genome split on chromosome
- genome.chr.realign.intervals - targets for realignment per chromosome
- genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp
- genome.chr.indelsXYZ.vcf - known indels from, here from 1KG
Workflow outputs:
- sample.chr.bam - merged bam files per sample
- sample.chr.realign.interval - realignment target intervals
- sample.chr.realigned.bam - realigned
- sample.chr.matesfixed.bam - fixed pairs in realignment
- sample.chr.indels.vcf - raw indels called
- sample.chr.indels.bed - raw indels annotations
- sample.chr.indels.txt - output from the indel calling
- sample.chr.indels.filtered.bed - indels filtered
- sample.chr.snps.vcf - raw snps called
- sample.chr.snps.filtered.vcf - snps filtered
merge-lanes
Merge lanes into one sample bam
| tool: | sam merge | 
| inputs: | lane.chr.recal.sorted.bam | 
| outputs: | sample.chr.bam | 
| docs: | http://samtools.sourceforge.net/samtools.shtml | 
RealignerTargetCreator
Create realignment targets based on the data (so not only knowns)
| tool: | GenomeAnalysisTK.jar -T RealignerTargetCreator? | 
| inputs: | sample.chr.bam genome.chr.fa dbsnpXYz.chr.rod indelsXYZ.vcf | 
| outputs: | sample.chr.realign.intervals | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Creating_Intervals | 
IndelRealigner
Realign based on realignment targets in previous step
| tool: | GenomeAnalysisTK.jar -T IndelRealigner? | 
| inputs: | sample.chr.bam genome.chr.realign.intervals genome.chr.dbsnpXYZ.rod genome.chr.indelsXYZ.vcf | 
| outputs: | sample.chr.realigned.bam | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Realigning | 
FixMateInformation
See description in workflow2, now applied to sample
| inputs: | sample.chr.realigned.bam | 
| ouputs: | sample.chr.matesfixed.bam | 
IndelGenotyperV2
Call indels
| tool: | GenomeAnalysisTK.jar -T IndelGenotyperV2 | 
| inputs: | sample.chr.matesfixed.bam genome.chr.fa | 
| outputs: | sample.chr.indels.vcf sample.chr.indels.bed sample.chr.indels.txt | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0 | 
filterSingleSampleCalls
Filter indels
| tool: | filterSingleSampleCalls.pl | 
| inputs: | sample.chr.indels.bed | 
| outputs: | sample.chr.indels.filtered.bed | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper | 
UnifiedGenotyper
Call SNPs
| tool: | GenomeAnalysisTK.jar -T UnifiedGenotyper? | 
| inputs: | sample.chr.matesfixed genome.chr.fa dbsnpXYz.chr.rod | 
| outputs: | sample.chr.snps.vcf | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SetUnifiedGenotypertoEval | 
makeIndelMask
Make indel mask
| tool: | makeIndelMask.py | 
| inputs: | sample.chr.indels.bed | 
| outputs: | sample.chr.indels.mask.bed | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0#Creating_a_indel_mask_file | 
VariantFiltration
Filter variants to get the best calls possible
| tool: | GenomeAnalysisTK.jar -T VariantFiltration? | 
| inputs: | sample.chr.snps.vcf genome.chr.fa dbsnpXYz.chr.rod | 
| outputs: | sample.chr.snps.filtered.vcf | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v2#Integrating_analyses:_getting_the_best_call_set_possible | 
MergeVcfs
ChipVcf
Produce vcf for the chips
VariantEval
Create summary information on the variations called for evaluation. Run per sample.snps.filtered.vcf against chip.
| tool: | GenomeAnalysisTK.jar -T VariantEval? | 
| inputs: | sample.snps.vcf sample.chip.vcf genome.chr.fa dbsnpXYz.chr.rod | 
| outputs: | sample.snps.eval | 
| doc: | http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval | 
Discussion:
Do we call SNPs based on the filtered indels or the raw indels? Should we realign AGAIN after merge of lanes? BAQ? MINDEL/PINDEL?
Dindel
Dindel has finally been released.
The article can be found here: http://www.ncbi.nlm.nih.gov/pubmed/20980555 The tool can be downloaded from: http://www.sanger.ac.uk/resources/software/dindel/
Questions:
- Are we going to implement this tool also, besides Pindel?
- If so, were is this tool implemented?

