|   | 1 | = Data QC based on mix-up mapping and concordance of imputed genotypes with genotypes called from RNAseq data = | 
                  
                          |   | 2 | We used 3 ways of doing the QC: | 
                  
                          |   | 3 |  | 
                  
                          |   | 4 |  1. mix-up mapper: matching genotypes with expression for each sample; | 
                  
                          |   | 5 |  1. genotype concordance: calculating the concordance of imputed genotypes with genotypes called from RNAseq data; | 
                  
                          |   | 6 |  1. heterozygosity rate. | 
                  
                          |   | 7 |  | 
                  
                          |   | 8 | The blacklist of samples that do not pass these quality checks can be found in the attachment. | 
                  
                          |   | 9 |  | 
                  
                          |   | 10 | === LLS === | 
                  
                          |   | 11 | MixupMapper detected 5 swaps and 7 samples with wrong genotype. The swaps will be performed and the 7 genotypes replaced. This leaves 7 samples (pers_id: 2014, 3142, 3144, 2634, 2890, 3126 and 3150) without genotype, these should be removed. | 
                  
                          |   | 12 |  | 
                  
                          |   | 13 | ||=geno_id =||=run_id =||=Best Match (geno_id) =||=Best Match (run_id) =||= Action =|| | 
                  
                          |   | 14 | || 561 ||BD2CPRACXX-1-12 || 563 ||BD2CPRACXX-1-21 || swap || | 
                  
                          |   | 15 | || 563 ||BD2CPRACXX-1-21 || 561 ||BD2CPRACXX-1-12 || swap || | 
                  
                          |   | 16 | || 974 ||BD24PGACXX-8-25 || 978 ||BD24PGACXX-7-8 || swap || | 
                  
                          |   | 17 | || 978 ||BD24PGACXX-7-8 || 974 ||BD24PGACXX-8-25 || swap || | 
                  
                          |   | 18 | ||1841 ||AD2CJPACXX-6-9 ||1842 ||AD2CJPACXX-5-1 || swap || | 
                  
                          |   | 19 | ||1842 ||AD2CJPACXX-5-1 ||1841 ||AD2CJPACXX-6-9 || swap || | 
                  
                          |   | 20 | ||2585 ||AD2DATACXX-3-21 ||3273 ||AD2DATACXX-3-22 || swap || | 
                  
                          |   | 21 | ||3273 ||AD2DATACXX-3-22 ||2585 ||AD2DATACXX-3-21 || swap || | 
                  
                          |   | 22 | ||3411 ||BD2D5MACXX-3-7 ||3413 ||BD2D5MACXX-4-15 || swap || | 
                  
                          |   | 23 | ||3413 ||BD2D5MACXX-4-15 ||3411 ||BD2D5MACXX-3-7 || swap || | 
                  
                          |   | 24 | ||2928 ||AD2DATACXX-8-1 ||2014 ||BD2CPRACXX-1-22 || replace genotype || | 
                  
                          |   | 25 | ||3126 ||AD1NFNACXX-8-25 ||3142 ||BD1NYRACXX-2-15 || replace genotype || | 
                  
                          |   | 26 | ||3142 ||BD1NYRACXX-2-15 ||3144 ||AD1NAMACXX-7-19 || replace genotype || | 
                  
                          |   | 27 | ||3194 ||AD2DATACXX-4-5 ||2634 ||AD2DATACXX-4-9 || replace genotype || | 
                  
                          |   | 28 | || 311 ||BD1NW4ACXX-7-13 ||2890 ||BD1NYRACXX-5-23 || replace genotype || | 
                  
                          |   | 29 | || 905 ||AD1NFNACXX-8-27 ||3126 ||AD1NFNACXX-8-25 || replace genotype || | 
                  
                          |   | 30 | ||6039 ||AD1NE2ACXX-5-22 ||3150 ||BD24PGACXX-5-5 || replace genotype || | 
                  
                          |   | 31 |  | 
                  
                          |   | 32 |  | 
                  
                          |   | 33 | ==== Possibly contaminated samples ==== | 
                  
                          |   | 34 |  | 
                  
                          |   | 35 | The outliers that show high heterozygosity rate in genotypes called from RNA-seq. [[BR]][[BR]] | 
                  
                          |   | 36 | Also present in gender-specific analysis (see below):[[BR]] | 
                  
                          |   | 37 | BC1KBKACXX-5-6[[BR]] | 
                  
                          |   | 38 | BD1NW4ACXX-8-5[[BR]] | 
                  
                          |   | 39 | BD1NYRACXX-2-16[[BR]] | 
                  
                          |   | 40 | BC1KBKACXX-5-3[[BR]] | 
                  
                          |   | 41 | BC1KBKACXX-5-1[[BR]] | 
                  
                          |   | 42 | BD1NYRACXX-2-27[[BR]] | 
                  
                          |   | 43 | BD1NYRACXX-4-19[[BR]] | 
                  
                          |   | 44 | BC1KBKACXX-5-4[[BR]][[BR]] | 
                  
                          |   | 45 | Possible gender-neutral contaminations:[[BR]] | 
                  
                          |   | 46 | BC1KBKACXX-3-12[[BR]] | 
                  
                          |   | 47 | BC1KBKACXX-5-7[[BR]] | 
                  
                          |   | 48 | BD24PGACXX-7-10[[BR]] | 
                  
                          |   | 49 | BC1KBKACXX-5-5[[BR]] | 
                  
                          |   | 50 | BD1NYRACXX-3-1[[BR]] | 
                  
                          |   | 51 | BC1KBKACXX-5-2[[BR]] | 
                  
                          |   | 52 | AD1NFNACXX-4-8[[BR]] | 
                  
                          |   | 53 | BD1NYRACXX-2-18[[BR]] | 
                  
                          |   | 54 | ===  === | 
                  
                          |   | 55 | === LifeLines === | 
                  
                          |   | 56 | http://www.molgenis.org/wiki/DeepNoteworthyObservations | 
                  
                          |   | 57 |  | 
                  
                          |   | 58 | LLDeep_0063 | 
                  
                          |   | 59 |  | 
                  
                          |   | 60 | Corresponding RNA-seq sample is AC1C40ACXX-4-4 (old id: 103001429206) has only 76% of reads aligned. Flagged by MixupMapper? as sample mix-up. Also shows many discordant genotypes when using SNVMix. | 
                  
                          |   | 61 |  | 
                  
                          |   | 62 |  | 
                  
                          |   | 63 | LLDeep_0350 | 
                  
                          |   | 64 |  | 
                  
                          |   | 65 | Corresponding RNA-seq sample is AD1GWFACXX-4-15 (old id: 103001383279), not flagged by MixupMapper?. However, shows many discordant genotypes when using SNVMix. | 
                  
                          |   | 66 |  | 
                  
                          |   | 67 | Has both high XIST and high chromosome Y expression levels. Average heteryzygosity for all samples = 49%, stdev = 1.9%. Sample LLDeep_0350, 103001383279 has heterozygosity rate of 72%: contaminated sample, where a male and female sample have likely been mixed in very similar proportions, hence the high expression levels of both XIST and chromosome Y genes.[[BR]] | 
                  
                          |   | 68 | [[BR]] | 
                  
                          |   | 69 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] | 
                  
                          |   | 70 |  | 
                  
                          |   | 71 |  | 
                  
                          |   | 72 | === CODAM === | 
                  
                          |   | 73 | '''eQTL mapping (gene level) results:''' | 
                  
                          |   | 74 |  | 
                  
                          |   | 75 | 6804 unique cis-regulated genes.[[BR]][[BR]] | 
                  
                          |   | 76 |  | 
                  
                          |   | 77 | '''Samples that failed the QC:''' | 
                  
                          |   | 78 |  | 
                  
                          |   | 79 | 2345 (RNA-seq ids: AD10W1ACXX-8-11, CODAM-102-130804): mix-up mapper + genotype concordance; | 
                  
                          |   | 80 |  | 
                  
                          |   | 81 | 2495 (RNA-seq ids: AD10W1ACXX-5-18, CODAM-156-130804): mix-up mapper + genotype concordance; | 
                  
                          |   | 82 |  | 
                  
                          |   | 83 | It looks like RNA-seq sample ids were swapped for these two samples (see: http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun1 of 12-December-2013)[[BR]] | 
                  
                          |   | 84 |  | 
                  
                          |   | 85 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] | 
                  
                          |   | 86 |  | 
                  
                          |   | 87 | === RS === | 
                  
                          |   | 88 | '''eQTL mapping (gene level) results:''' | 
                  
                          |   | 89 |  | 
                  
                          |   | 90 | 7708 unique cis-regulated genes.[[BR]][[BR]] | 
                  
                          |   | 91 |  | 
                  
                          |   | 92 | '''Samples that failed the QC:''' | 
                  
                          |   | 93 |  | 
                  
                          |   | 94 | 8190002 (RNA-seq ids: AD1NNNACXX-4-18, RS-287-130804): mix-up mapper + genotype concordance; | 
                  
                          |   | 95 |  | 
                  
                          |   | 96 | 9353 (AC1JV9ACXX-1-13, RS-761-130804): mix-up mapper + genotype concordance; | 
                  
                          |   | 97 |  | 
                  
                          |   | 98 | 3520 (BC1JTJACXX-6-7, RS-442-130804): genotype concordance; | 
                  
                          |   | 99 |  | 
                  
                          |   | 100 | 562 (BC1KAVACXX-8-13, RS-55-130804): genotype concordance + heterozygosity rate; | 
                  
                          |   | 101 |  | 
                  
                          |   | 102 | ~~6734 (RS-502-130804): genotype concordance + heterozygosity rate;~~ (passed QC in the first run data)[[BR]] | 
                  
                          |   | 103 |  | 
                  
                          |   | 104 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] | 
                  
                          |   | 105 | [[BR]] | 
                  
                          |   | 106 |  | 
                  
                          |   | 107 | = Data QC based on median correlations of gene counts from each sample to all other samples = | 
                  
                          |   | 108 | [[BR]] Samples with much lower median correlations to all other samples [[BR]] For methods see: http://www.bbmriwiki.nl/wiki/gene_exon_transcript_count [[BR]] AC1JV9ACXX.1.10        0.0471[[BR]] AD1NE2ACXX.5.22        0.1174[[BR]] AD2D8RACXX.3.3        0.8028[[BR]] AD2D8RACXX.6.3        0.8093[[BR]] AD2D8RACXX.1.3        0.8257[[BR]] [[BR]] | 
                  
                          |   | 109 |  | 
                  
                          |   | 110 | = Outliers to be removed based on QC stats and PC analysis = | 
                  
                          |   | 111 | [[BR]] Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] Too few reads: [[BR]] AC1JV9ACXX-1-10[[BR]] AD1NE2ACXX-5-22[[BR]] BD1NW4ACXX-3-27[[BR]] Other reasons: See http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun [[BR]] BD1NYRACXX-6-10        too low percentage of mapped reads, outlier on principal component 1,4,5,6[[BR]] AD2CJPACXX-8-9        low exon correlation, outlier on principal component 1,11,14[[BR]] BD1NR9ACXX-7-27        low percentage of mapped reads, outlier on principal component 4, likely degraded[[BR]] | 
                  
                          |   | 112 |  | 
                  
                          |   | 113 | = Outliers to be removed based on gender-specific expression analysis = | 
                  
                          |   | 114 | Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] The normalized gene expression values (edgeR TMM method, expressed cpm) for XIST and for the sum of all protein-coding Y-chromosomal genes was used to check for contaminations between samples with different gender. The script can be found [raw-attachment:gender_analysis.r here]. In addition to sample LL AD1GWFACXX-4-15, the following samples (all from LLS) came up and appeared to be contaminated:[[BR]] BC1KBKACXX-5-1[[BR]] BC1KBKACXX-5-3[[BR]] BC1KBKACXX-5-4[[BR]] BC1KBKACXX-5-6[[BR]] BC1KBKACXX-5-8[[BR]] BD1NW4ACXX-8-5[[BR]] BD1NYRACXX-2-16[[BR]] BD1NYRACXX-2-27[[BR]] BD1NYRACXX-4-19[[BR]] |