| Version 1 (modified by , 9 years ago) (diff) | 
|---|
Pipeline todos
This page is reserved to track planned modifications to the pipeline for the full run.
Timeline
- October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.
 
The full run will start when:
- Issues on this page are implemented
 - Metadatabase issues resolved
 - All FQ files are merged and available
 
- Aim to start running after final plans for the second paper are clear
 
Full run implementation list
- Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential
- Unmasked for QTL (and expression quantification)
 - Masked for ASE (Check with Dasha for the masked index)
 
- Mask with GoNL, 1KG and UMCG ASE study snps.
 - Separate map statistics in analysis database
 
 - Modify STAR settings to Encode (below)
 - Variant calling on unmasked bam/mpileup (ASE)
 
Discussion points
- STAR 2-pass?
 
Suggested STAR Encode settings
Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples): /home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \ --runThreadN 8 \ --genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \ --genomeLoad NoSharedMemory \ --readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \ --readFilesCommand zcat \ --outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \ --outSAMstrandField intronMotif \ --outSAMunmapped Within \ --outFilterType BySJout \ //reduces the number of "spurious" junctions --outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped --outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute) --outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4) --alignIntronMin 20 \ //min intron size (default: 21) --alignIntronMax 1000000 \ //max intron (default: specified by the size of bins) --alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins) --alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5) --alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3)
