[[TOC()]] = Protocol for comparison between sequencing (VCF) and chip data = In this project, we will establish an infrastructure and will cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips). == Summary == '''Status''': under development '''Contributors''': Yurii, Lennart, Elisa '''Timeline''': end Sep 2010 - end Dec 2010 '''Resources''': BI/data manager/programmer at 0.75 fte (the same as the one on MendelianQcPipeline) + experienced supervisor at 0.1 fte (the same as the one on MendelianQcPipeline) '''Depends on''': availability of VCF data '''Other projects depending on this''': MendelianQcPipeline (soft), all projects which start with QC'ed data (e.g. all WP2 projects) == Aims and Deliverables == * Establish custom pipeline for Chip-based QC. * Verify the identity of the DNA samples * Check quality of sequence data. * Identify factors affecting quality of sequencing (e.g. batch effects). * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity of genotype calling. * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account). * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor). * In accord with MendelianQcPipeline, provide QC'ed data == Idea == A principal idea of what questions should be addressed (without saying how) is summarized in ChipBasedQcPipelineIdea. A number of 'burning' questions need to be addressed before the idea can be considered finished. These include * What format will be used for/by Chip data (ChipGtDataFormat)? * What factors, potentially affecting or indicating the quality of sequencing data (FactorsRelatedToSeqDataQuality), are to be addressed in the QC? == Workflow == Automated workflow (will be) provided in ChipBasedQcPipelineWorkflow page.