Changes between Version 1 and Version 2 of WorkPlan


Ignore:
Timestamp:
Sep 26, 2010 10:15:04 PM (14 years ago)
Author:
Yurii Aulchenko
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WorkPlan

    v1 v2  
    11[[TOC()]]
    2 = Work plan for quality control, call improvement and phasing (VCF to haplotypes) =
     2= Analysis work plan =
    33
    4 Compiled: Yurii Aulchenko, September 12, 2010
     4We have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.
    55
    6 As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.
    7 
    8 This document aims to provide an overview of the “VCF to haplotypes” work package.
     6This document aims to provide an overview of the “VCF to haplotypes” line summarizing ChipBasedQcPipeline, MendelianQcPipeline, TrioAwarePhasingPipeline
    97
    108== Plan for phase 1. ==
    119
    12 Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel:
     10Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include a number of independent sub-projects which may be ran in parallel in semi-independent manner:
    1311
    14 '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data.
     12'''Chip QC (ChipBasedQcPipeline):''' establish an infrastructure and cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).
    1513
    16 This work package ''aims to'':
     14'''Mendelian QC (MendelianQcPipeline)''': establish an infrastructure and perform QC of genotypic data generated by BGI using Mendelian errors check.
    1715
    18  * Establish custom pipeline for Chip-based QC.
    19  * Check quality of sequence data.
    20  * Identify factors affecting quality of sequencing (e.g. batch effects).
    21  * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity.
    22  * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
    23  * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).
     16'''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 1''': establish an infrastructure and provide preliminary list of variants discovered by GvNL
    2417
    25 ''Detailedworkflow'' is summarized in a separate document.
     18'''De-novo variants discovery (DeNovoVariationPipeline) phase 1''': establish an infrastructure and provide preliminary list of 'de-novo' mutations
    2619
    27  * Estimated costs'' for pilot data check and establishing the pipeline:
    28  * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte.
    29  * Suggested timeline:'' end of September – end of December
    30  * Depends on:'' availability of VCF pilot data
    31  * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard)
     20...
     21
     22Major outcomes of phase 1:
     23 * Established custom pipelines for Chip-based and Mendelian-check QC.
     24 * The list of factors affecting quality of sequencing.
     25 * Thresholds of quality metrics leading to the highest quality.
     26 * False-positive and false-negative rates for variants discovered.
     27 * Estimate of the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwareVariantDiscoveryPipeline, TrioAwarePhasingPipeline).
     28 * Estimate of the potential for ''de-novo'' variant discovery based on phasing information (see DeNovoVariationPipeline), provide preliminary list of such mutations
     29 * QC'ed sequence data.
     30
     31== Plan for phase 2. ==
     32
     33'''Genotype improvement and phasing (TrioAwarePhasingPipeline)''': establish an infrastructure and perform phasing of the sequence data
     34
     35'''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 2''': provide the list of variants discovered by GvNL
     36
     37'''De-novo variants discovery (DeNovoVariationPipeline) phase 2''': provide list of 'de-novo' mutations