Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of BIOS_Pipeline/pipeline_todo

Timestamp:: Sep 19, 2016 4:55:41 PM (9 years ago)
Author:: jamverlouw
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BIOS_Pipeline/pipeline_todo

                       v1
+= Pipeline todos =
+This page is reserved to track planned modifications to the pipeline for the full run.
+== Timeline ==
+- '''October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.'''
+The full run will start when:
+- Issues on this page are implemented
+- Metadatabase issues resolved
+- All FQ files are merged and available
+- '''Aim to start running after final plans for the second paper are clear'''
+== Full run implementation list ==
+- Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential
+. Unmasked for QTL (and expression quantification)
+. Masked for ASE (Check with Dasha for the masked index)
+ - Mask with GoNL, 1KG and UMCG ASE study snps.
+ - Separate map statistics in analysis database
+- Modify STAR settings to Encode (below)
+- Variant calling on unmasked bam/mpileup (ASE)
+== Discussion points ==
+- STAR 2-pass?
+=== Suggested STAR Encode settings ===
+{{{
+Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples):
+/home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \
+--runThreadN 8 \
+--genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \
+--genomeLoad NoSharedMemory \
+--readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \
+--readFilesCommand zcat \
+--outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \
+--outSAMstrandField intronMotif \
+--outSAMunmapped Within \
+--outFilterType BySJout \ //reduces the number of "spurious" junctions
+--outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped
+--outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute)
+--outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4)
+--alignIntronMin 20 \ //min intron size (default: 21)
+--alignIntronMax 1000000 \ //max intron (default: specified by the size of bins)
+--alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins)
+--alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5)
+--alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3)
+}}}