Changes between Version 1 and Version 2 of MendelianQcPipeline


Ignore:
Timestamp:
Sep 24, 2010 9:35:04 PM (14 years ago)
Author:
Yurii Aulchenko
Comment:

few edits + moving TrioAwarePhasingPipeline to separate page

Legend:

Unmodified
Added
Removed
Modified
  • MendelianQcPipeline

    v1 v2  
    33Check for Mendelian inconsistencies exploiting trio structure of the sample.
    44
     5This is ''phase 1'' project
     6
    57This work package ''aims to'':
    68 * Establish custom pipeline for Mendelian-check QC.
    79 * Check quality of sequence data.
    8  * Confirm factors (established in “'''Chip QC project”''') affecting quality of sequencing'''. '''
    9  *  Confirm and possibly fine-tune thresholds of quality metrics established in “'''Chip QC project'''”.
    10  * Confirm the false-positive and false-negative rates for variants discovered in our study (established in “'''Chip QC project'''”).
    11  * Explore the potential of improvement of calls by exploiting information from the sequencing of relatives.
     10 * Confirm factors (established in ChipBasedQcPipeline) affecting quality of sequencing'''. '''
     11 *  Confirm and possibly fine-tune thresholds of quality metrics established in ChipBasedQcPipeline.
     12 * Confirm the false-positive and false-negative rates for variants discovered in our study (established in ChipBasedQcPipeline).
     13 * Explore the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwarePhasingPipeline).
     14 * Estimate the potential for ''de-novo'' variant discovery based on phasing information (see DeNovoVariationPipeline)
    1215
    13 ''Detailed workflow'' is summarized in a separate document.
     16''Detailed workflow'' is summarized in a separate (protocol) document (link will be made from here when available).
    1417
    1518''Estimated costs'' for pilot data check and establishing the pipeline: 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte.
     
    2326'''Major deliverables of phase 1'''
    2427
    25  * Custom      pipelines for chip-based and Mendelian-error based QC
    26  * QC’ed      sequence data (input for call improvement, phasing, novel variants,      population genetics (variants) and functional variants projects)
     28 * Custom pipeline for Mendelian-error based QC
     29 * QC’ed sequence data (also depends on ChipBasedQcPipeline); these data will serve as an input for call improvement, phasing, novel variants, population genetics (variants) and functional variants projects
    2730
    2831Note that after completion of phase 1, preliminary phasing can be done using available [ population-based software (e.g. IMPUTE, MACH, BIMBAM), with haplotypes serving as input for imputations project][file:///C:/Users/MORRIS~1/AppData/Local/Temp/QC%20overview%202010.09.10.doc#_msocom_2 "[YA2]"] .
    2932
    30 ''__Plan for phase 2.__''
    31 
    32 '''Call improvement and phasing project.'''
    33 
    34 '''Background.''' We plan to use phased genotypes from GvNL for further imputations. In this, we need high quality of both genotypes and the phasing.
    35 
    36 '''Problems.''' It is well recognized that at 12x there is an essential chance that a heterozygous genotype will not be called (estimated roughly as ~1%). Furthermore, for a given individual a certain proportion of the genome will not be covered well; the genotypes at these regions can not be called or will be called with low quality. The effects of such errors and missing data onto further imputations may be large. Other factor affecting quality of further imputations is quality of phasing.
    37 
    38 '''Proposed solution.''' All above problems can be address in the same framework. Basically, phasing information provides us with the means to fill in missing genotypes and correct erroneously called ones. For example, if in a person coverage is low at a certain regions, we can use information from the first degree relative to figure out what genotypes are there. Sequencing errors can be detected in very much the same way. Thus, phasing and imputations provide us with an attractive opportunity to minimize sequencing errors and proportion of missing data.
    39 
    40 This work package ''aims to'':
    41 
    42 ·         Improve quality of sequence genotypes data by fixing errors and filling in missing values
    43 
    44 ·         Phase the genotypes
    45 
    46 ''Detailed workflow'' is summarized in a separate document.
    47 
    48 ''Estimated costs'': 6 months of experienced !PostDoc at 1.0 fte + BI/programmer at 0.5 fte + supervisor at 0.1 fte.
    49 
    50 ''Suggested timeline:'' January 2011 – July 2011 (???)
    51 
    52 ''Depends on:'' availability of QC’ed genotypes from phase 1
    53 
    54 ''Other projects depending on this:'' imputations (hard), population genetics (LD, hard), functional variants (final catalogue, soft), novel variants discovery (final catalogue, soft).
    55 
    56 '''Major deliverables'''
    57 
    58  * Novel      methods and software
    59  * Improved      genotypes
    60  * Phasing      information
    61