Changes between Version 2 and Version 3 of ChipBasedQcPipelineIdea


Ignore:
Timestamp:
Sep 26, 2010 7:35:41 PM (14 years ago)
Author:
Yurii Aulchenko
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ChipBasedQcPipelineIdea

    v2 v3  
    6666Genotypic table (name: VCF_genotypes_yyyy.mm.dd.txt). Tab-delimited file containing following information. Header line:
    6767
    68 ID      SNPV    GTVCF   GQ      DP      BATCH   ????
     68ID      SNPV    GTVCF   GQ      DP      ...
    6969
    7070Next lines should all contain XXX tab-delimited values. Use “.” (dot) for missing.
     
    7373 * GTVCF: genotype with alleles in alphabetic order, <two characters, each either “A”, “C”, “G” or “T”>. This can be done by mapping the numbers provided in VCF GT field to REF and ALT and then ordering.
    7474 * GQ, DP: directly from VCF file
    75 * BATCH …
    76 
    77 HERE WE NEED TO DECIDE WHAT POTENTIALLY QUALITY-AFFECTING VARIABLES (SUCH AS BATCH) WE NEED TO TAKE INTO ACCOUNT
     75* …: factors potentially associated with quality of the sequencing data, summarized in FactorsRelatedToSeqDataQuality.
    7876
    7977Merge chip and VCF genotypic tables (“chip_genotypes_yyyy.mm.dd.txt” and “VCF_genotypes_yyyy.mm.dd.txt”) using ID and SNPV as key variables. Keep all chip genotypes, substituting missing (“.”) when no information is available from VCF. Name the table “merged_chip_and_VCF_genotypes_yyy.mm.dd.txt”.