| 1 |   | = Gene, exon and transcript counts = | 
                      
                        | 2 |   |  | 
                      
                        | 3 |   |  | 
                      
                        | 4 |   | == Counts and Spearman correlations for run 1 == | 
                      
                        | 5 |   |  | 
                      
                        | 6 |   | Date: 06-november-2013[[BR]] | 
                      
                        | 7 |   | Analysis by: Peter-Bram 't Hoen[[BR]] | 
                      
                        | 8 |   |  | 
                      
                        | 9 |   | The combined gene counts for the 2330 samples from run 1 are available on the VM: /virdir/Backup/run_1_gene_counts/combined_gene_count_run_1.txt and were generated using this script: [raw-attachment:merge_count_script.r R script for merging gene count tables][[BR]] | 
                      
                        | 10 |   | Subsequently, pairwise Spearman correlations were calculated: /virdir/Backup/run_1_gene_counts/Spearman_correlations_complete_gene_data_run_1.txt[[BR]] | 
                      
                        | 11 |   | From these the median Spearman correlation for each sample to each other sample was calculated. This is also called the D-statistic. The D-statistics (ranked from low to high) can be found in this file [raw-attachment:Median_pairwise_spearman_correlations_complete_gene_data_run_1.txt Median Spearman correlations][[BR]] | 
                      
                        | 12 |   |  | 
                      
                        | 13 |   | [raw-attachment:Median_pairwise_spearman_correlations_by_flowcell_complete_gene_data_run_1.pdf Boxplot of median Spearman correlations grouped by flowcell] (Martijn Vermaat)[[BR]] | 
                      
                        | 14 |   |  | 
                      
                        | 15 |   | [raw-attachment:Dstat_biobank_boxplot.pdf Boxplot of median Spearman correlations grouped by biobank] [[BR]] | 
                      
                        | 16 |   |  | 
                      
                        | 17 |   | After removing the two samples with very low Spearman correlations to all other samples, the distance matrix was calculated (1 - correlation matrix), and a two-dimensional MDS plot was created using the R function cmdscale. [raw-attachment:mdsplot_filt_colored_biobank.pdf This is the resulting mdsplot]. The plot was colored according to the following color scheme: [[BR]] | 
                      
                        | 18 |   | "LL"  - gold[[BR]] | 
                      
                        | 19 |   | "RS"  - blue[[BR]] | 
                      
                        | 20 |   | "CODAM" - orange[[BR]] | 
                      
                        | 21 |   | "LLS" - pink[[BR]] | 
                      
                        | 22 |   | "Amsterdam" - darkred[[BR]] | 
                      
                        | 23 |   |  | 
                      
                        | 24 |   | Same mds plot but now colored according to mean GC percentage: [raw-attachment:mdsplot_filt_colored_gc.pdf mdsplot GC] | 
                      
                        | 25 |   |  | 
                      
                        | 26 |   |   | 
                      
                      
                        |   | 1 | This page has been moved to [wiki:FgGeneExonTranscriptCounts Gene, exon and transcript counts]. |