[[TOC()]] = Work plan = SEEMS TO BE RATHER INCOMPLETE -- I HAVE INCLUDED OVERVIEW OF ONLY SOME PARTS OF THE PLAN, PLEASE REVISE We have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. This document aims to provide an overview of the “VCF to haplotypes” line summarizing ChipBasedQcPipeline, MendelianQcPipeline, TrioAwarePhasingPipeline == Plan for phase 1. == Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include a number of independent sub-projects which may be ran in parallel in semi-independent manner: '''Chip QC (ChipBasedQcPipeline):''' establish an infrastructure and cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips). '''Mendelian QC (MendelianQcPipeline)''': establish an infrastructure and perform QC of genotypic data generated by BGI using Mendelian errors check. '''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 1''': establish an infrastructure and provide preliminary list of variants discovered by GvNL '''De-novo variants discovery (DeNovoVariationPipeline) phase 1''': establish an infrastructure and provide preliminary list of 'de-novo' mutations ... Major outcomes of phase 1: * Established custom pipelines for Chip-based and Mendelian-check QC. * The list of factors affecting quality of sequencing. * Thresholds of quality metrics leading to the highest quality. * False-positive and false-negative rates for variants discovered. * Estimate of the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwareVariantDiscoveryPipeline, TrioAwarePhasingPipeline). * Estimate of the potential for ''de-novo'' variant discovery based on phasing information (see DeNovoVariationPipeline), provide preliminary list of such mutations * QC'ed sequence data. == Plan for phase 2. == '''Genotype improvement and phasing (TrioAwarePhasingPipeline)''': establish an infrastructure and perform phasing of the sequence data '''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 2''': provide the list of variants discovered by GvNL '''De-novo variants discovery (DeNovoVariationPipeline) phase 2''': provide list of 'de-novo' mutations