|   | 1 |  | 
                  
                          |   | 2 | = Pipeline todos = | 
                  
                          |   | 3 |  | 
                  
                          |   | 4 | This page is reserved to track planned modifications to the pipeline for the full run. | 
                  
                          |   | 5 |  | 
                  
                          |   | 6 | == Timeline ==  | 
                  
                          |   | 7 |  | 
                  
                          |   | 8 | - '''October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.''' | 
                  
                          |   | 9 |  | 
                  
                          |   | 10 | The full run will start when: | 
                  
                          |   | 11 |  | 
                  
                          |   | 12 | - Issues on this page are implemented | 
                  
                          |   | 13 | - Metadatabase issues resolved | 
                  
                          |   | 14 | - All FQ files are merged and available | 
                  
                          |   | 15 |  | 
                  
                          |   | 16 | - '''Aim to start running after final plans for the second paper are clear''' | 
                  
                          |   | 17 |  | 
                  
                          |   | 18 | == Full run implementation list == | 
                  
                          |   | 19 |  | 
                  
                          |   | 20 | - Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential | 
                  
                          |   | 21 |  1. Unmasked for QTL (and expression quantification) | 
                  
                          |   | 22 |  2. Masked for ASE (Check with Dasha for the masked index) | 
                  
                          |   | 23 |  - Mask with GoNL, 1KG and UMCG ASE study snps. | 
                  
                          |   | 24 |  - Separate map statistics in analysis database | 
                  
                          |   | 25 | - Modify STAR settings to Encode (below) | 
                  
                          |   | 26 | - Variant calling on unmasked bam/mpileup (ASE) | 
                  
                          |   | 27 |  | 
                  
                          |   | 28 | == Discussion points == | 
                  
                          |   | 29 |  | 
                  
                          |   | 30 | - STAR 2-pass? | 
                  
                          |   | 31 |  | 
                  
                          |   | 32 | === Suggested STAR Encode settings === | 
                  
                          |   | 33 |  | 
                  
                          |   | 34 | {{{ | 
                  
                          |   | 35 | Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples): | 
                  
                          |   | 36 | /home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \ | 
                  
                          |   | 37 | --runThreadN 8 \ | 
                  
                          |   | 38 | --genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \ | 
                  
                          |   | 39 | --genomeLoad NoSharedMemory \ | 
                  
                          |   | 40 | --readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \ | 
                  
                          |   | 41 | --readFilesCommand zcat \ | 
                  
                          |   | 42 | --outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \ | 
                  
                          |   | 43 | --outSAMstrandField intronMotif \ | 
                  
                          |   | 44 | --outSAMunmapped Within \ | 
                  
                          |   | 45 | --outFilterType BySJout \ //reduces the number of "spurious" junctions | 
                  
                          |   | 46 | --outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped | 
                  
                          |   | 47 | --outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute) | 
                  
                          |   | 48 | --outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4) | 
                  
                          |   | 49 | --alignIntronMin 20 \ //min intron size (default: 21) | 
                  
                          |   | 50 | --alignIntronMax 1000000 \ //max intron (default: specified by the size of bins) | 
                  
                          |   | 51 | --alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins) | 
                  
                          |   | 52 | --alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5) | 
                  
                          |   | 53 | --alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3) | 
                  
                          |   | 54 | }}} |