Changes between Version 59 and Version 60 of SnpCallingPipeline


Ignore:
Timestamp:
Jan 24, 2011 5:45:54 PM (14 years ago)
Author:
laurent
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SnpCallingPipeline

    v59 v60  
    106106== List of steps ==
    107107[[TOC(SnpCallingPipeline/ReferencePreparation,SnpCallingPipeline/AlignmentAndCleaning,SnpCallingPipeline/VariantCalling,inline,noheading)]]
     108
     109== Quality Control ==
     110The current important values discussed for the quality control along with their thresholds are the following:
     111* RawData
     112** FastQC report (per mate of the pair)
     113*** Manual look at files and check:
     114**** Avg Quality per read > 30
     115**** Num sequences ~60Mio
     116**** Sequence quality should look OK
     117* Alignment (per lane)
     118** Picard Alignment Summary Metrics
     119*** %Purified reads aligned > 90%
     120*** Purified High Quality Error Rate < 1%
     121*** Purified reads aligned > 150Mio
     122** Picard GC Bias Metrics
     123*** GC Curve should look OK
     124*** Median GC% windows between 30 and 40
     125*** Avg Mean Base Quality should be OK
     126** Picard Insertsize Metrics
     127*** Peak should be ~500
     128*** Peak should be narrow
     129*** Should have few outliers
     130** Picard BAM Index Stats
     131*** Should be uniform by Chromosome
     132** GATK or Picard (currently testing) Coverage Metrics
     133*** Should correspond to a Poisson curve with peak at 12x
     134** Picard Mark Duplicates
     135*** %duplicates between 5% and 8%
     136* Recalibration
     137** GATK Analyze Covariate
     138*** No output currently; should revisit when working
     139** Picard Quality by Cycle
     140*** To be determined once data is produced
     141** Picard Quality Distribution
     142*** To be determined once data is produced
     143* Initial SNP Calling
     144** To be determined once data is produced and analyzed. A first basis for it should be derived from the difference between chipdata and sequence data and the %of SNPs found in dbSNP.
     145