Changes between Version 6 and Version 7 of GwasQcPipeline


Ignore:
Timestamp:
Feb 10, 2011 5:32:47 PM (13 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GwasQcPipeline

    v6 v7  
    11= GWAS QC pipeline =
    22
    3 Removed all bad samples (taken from Mathieu).
     3Removed all bad samples (taken from Mathieu).
     4 
    45{{{p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples}}}
    56
    67Update family information
     8
    79{{{p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt  --make-bed --out GvNL_good_fam}}}
    810
    911Set trio relationships
     12
    1013{{{p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt  --make-bed --out GvNL_good_fam_par}}}
    1114
    1215Manual check for suspicious individuals (not in trios) and remove them
     16
    1317{{{<pre>p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt  --make-bed --out GvNL_good_fam_par_susp}}}
    1418
     
    1620
    1721Check and update sex information
     22
    1823{{{p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed}}}
    1924
     
    2429
    2530Update sex information for children and swapped individuals
     31
    2632{{{<pre>p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final}}}
    2733
    2834Identification of individuals with elevated missing data rates or outlying heterozygosity rate (See Anderson, NP2010, p.1569) => How does this compare with the PLINK suggested approach  [[http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml here]]
    2935
    30 Get missing data information: {{{p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}}
     36Get missing data information:
     37
     38 {{{p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}}
    3139 
    32 Get heterozygocity information: {{{p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}}
     40Get heterozygocity information:
     41
     42{{{p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}}
    3343
    3444Calculate the observed heterozygosity rate per individual using the formula (N(NM)  −  O(Hom))/N(NM) and plot the missing SNPs vs heterozygocity rate for eyeball inspection.
    35 * In R: {{{het <- read.table("GvNL_raw_final.het", head=TRUE)&#10;mis <- read.table("GvNL_raw_final.imiss", head=TRUE)&#10;het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM."&#10;plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate")}}}
     45* In R:
     46
     47{{{het <- read.table("GvNL_raw_final.het", head=TRUE)&#10;mis <- read.table("GvNL_raw_final.imiss", head=TRUE)&#10;het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM."&#10;plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate")}}}
    3648
    3749No samples with callrate < 95% (Filtered in lab)
     
    3951Put samples where heterozygocity rate is outside 3 standard dev from the mean (Anderson, NP2010, p.1569) in fail-het-3sd-qc.txt, along with their deviation from the mean
    4052
    41 In R: {{{het_fail = subset(het, (het$HET_RATE < mean(het$HET_RATE)-3*sd(het$HET_RATE)) | (het$HET_RATE > mean(het$HET_RATE)+3*sd(het$HET_RATE)))&#10;het_fail$HET_DST = (het_fail$HET_RATE-mean(het$HET_RATE))/sd(het$HET_RATE)&#10;write.table(het_fail, "fail-het-qc.txt", row.names=FALSE)}}}
     53In R:
     54
     55{{{het_fail = subset(het, (het$HET_RATE < mean(het$HET_RATE)-3*sd(het$HET_RATE)) | (het$HET_RATE > mean(het$HET_RATE)+3*sd(het$HET_RATE)))&#10;het_fail$HET_DST = (het_fail$HET_RATE-mean(het$HET_RATE))/sd(het$HET_RATE)&#10;write.table(het_fail, "fail-het-qc.txt", row.names=FALSE)}}}
    4256
    4357Result:
     
    4660Check inheritance and duplicates
    4761* Prune SNPs for LD
     62
    4863{{{p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final}}}
     64
    4965* Generate pairwise Identity-By-State (IBS) metrics using pruned SNPs only
     66
    5067{{{<pre>p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final}}}
     68
    5169* Check trio inheritance based on IBS: Check the Pi_HAT value for the individuals (unrelated individuals =~ 0, parent <-> child =~ 0.5, siblings =~0.5, twins =~ 1.0). Done with homemade perl script.
    5270* 2 Families are not real families (G34c (child) is wrong and A56b (mother) is wrong)
    5371* Duplicated family: R24 == R25
     72
    5473Check trio inheritance based on Mendelian segregation
     74
    5575{{{p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance}}}
    5676