wiki:BIOS_RnaSeq

Version 1 (modified by jamverlouw, 8 years ago) (diff)

--

RNASeq protocols

In a nutshell, all raw RNAseq datasets on Grid will look like the following:

  • sample AD1NNNACXX-1-2
    • srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/RNASeq/AD1NNNACXX-1-2/AD1NNNACXX-1-2_R1.fg.gz
    • srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/RNASeq/AD1NNNACXX-1-2/AD1NNNACXX-1-2_R2.fg.gz
  • sample AD1NNNACXX-1-4
    • srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/RNASeq/AD1NNNACXX-1-4/AD1NNNACXX-1-4_R1.fg.gz
    • srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/RNASeq/AD1NNNACXX-1-4/AD1NNNACXX-1-4_R2.fg.gz

The locations of sample information, file location, md5sum, etc are stored in the Metadatabase.

SRM location

All RNA sample datasets are stored at srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/RNASeq/.

Note: all data are also copied to the Grid archive and where they are backuped.

Directory naming

Each sample is to be uploaded into its own, separate directory. The directory name is formatted as follows:

<flowcell ID>-<lane ID>-<index ID>/

For instance:

AD1NNNACXX-1-2/

File naming

Files should follow the same naming scheme as the directories:

GZipped .fastq's

Fastq files, required for alignment, are to be formatted as follows:

<flowcell ID>-<lane ID>-<index ID>_R1.fq.gz
<flowcell ID>-<lane ID>-<index ID>_R2.fq.gz