wiki:SnpAnnotationPipeline

Introduction

This is a SNP annotation pipeline.

GraphViz image

PrepareGFFFilesFromBGIForSeattleSeqAnnotation

Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool?. Replace alleles with allele and adds the line: # autoFile testAuto.txt in the top of the file.

Parameters

  • GFFFilename : Input filename
  • outputGFFFilename: Output filename

Example

Code highlighting:

PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff")

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/PrepareGFFFilesFromBGIForSeattleSeqAnnotation.py

AnnotateVarianListFileViaSeattleSeqAnnotation

Annotate Files with Variants through Seattle Seq Annotation:  http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq? Annotation:  http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files:

  • httpunit.jar
  • js-1.6R5.jar
  • junit-3.8.1.jar
  • nekohtml-0.9.5.jar
  • xercesImpl-2.6.1.jar

Parameters

For a complete list of parameters please check the  Annotation website and the example below

Example

Code highlighting:

AnnotateVarianListFileViaSeattleSeqAnnotation(
        inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff,
        outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt,
        eMail=alexandros.kanterakis@gmail.com,
        fileFormat=GFF,
        geneData=CCDS2008,
        allelesMaq=true,
        allelesDBSNP=true,
        scorePhastCons=true,
        scorePhastCons=true,
        consScoreGERP=true,
        chimpAllele=true,
        CNV=true,
        geneList=true,
        HapMapFreqType=HapMapFreqMinor,
        geneList=true,
        hasGenotypes=true,
        dbSNPValidation=true,
        repeats=true,
        geneList=true,
        proteinSequence=true,
        polyPhen=true,
        clinicalAssociation=true
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation.py

AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs

This method takes a list of files that have been generated from  SeattleSeq Annotation tool and a list of tabular files that contain Chromosome and position columns. It adds the  polyphen annotation that is contained in the former list of files to the later.

Parameters

  • listOfSeattleSeqAnnotationOutputs: list of SeattleSeq? Annotation files that we want to take the polyphen annotation from
  • listOfFileToBeAnnotated: List of files with chromosome and position information.
  • chromosomeColumn: The Chromosome column of the files to be annotated
  • positionColumn: The position column of the files to be annotated
  • outputDir: The directory where the generated files will be stored
  • outputSuffix: The suffix of the output files.

Example

Code highlighting:

listOfSeattleSeqAnnotationOutputs = [
"/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt"
]

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]


AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs(
        listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs,
        listOfFileToBeAnnotated=filesToBeAnnotated,
        chromosomeColumn=2,
        positionColumn=3,
        outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection",
        outputSuffix="_poluphenExample.txt",
        numberOfFirstLinesToIgnore=1
        )

Source code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs.py

AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl

Create Gene Ontology ( http://www.geneontology.org/) annotation for a list of files that contain at least a position column and a chromosome column.

Parameters

  • listOfFilesToAnnotate: Python list of filenames to be annotated
  • numberOfFirstLinesToIgnoreInFileToAnnotate:
  • chromosomeColumnOfFilesToAnnotate: The # of the chromosome column in the file to be annotated (starting from 0)
  • positionColumnOfFilesToAnnotate: The # of the position column in the file to be annotated (starting from 0)
  • resolveDuplicateValuesFunctionInFileToBeAnnotated: What should we do if we found 2 lines in the file to be annotated that has the same position and chromosome? If not set to None it will call the function assigned to this parameter
  • fileWithGOAnnotation: The file that has been downloaded from BioMart? and contains the GO annotation.
  • fileWithGOAnnotationChromosomeColumn: The column that contain the chromosome in the fileWithGOAnnotation
  • fileWithGOAnnotationStartColumn: The column that contain the start of the transcript in the fileWithGOAnnotation
  • fileWithGOAnnotationEndColumn: The columns that contain the end of the transcript in the fileWithGOAnnotation
  • columnsWithGOAnnotationComaSeparated: The columns that contain the annotations that we want to add in the fileWithGOAnnotation. Example: "2,3,4"
  • numberOfFirstLinesToIgnoreInGOAnnotationFile
  • outputDirectory
  • outputSuffix: The output file will be: outputDirectory/(basename of inputFile)+outputSuffix

Example

Code highlighting:

fileList= [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl(
        listOfFilesToAnnotate=fileList,
        numberOfFirstLinesToIgnoreInFileToAnnotate=1,
        chromosomeColumnOfFilesToAnnotate=2,
        positionColumnOfFilesToAnnotate=3,
        fileWithGOAnnotation="/Users/alexandroskanterakis/Data/Ensembl/GENE_START_END_GO_FROM_ENSEMBL_36.txt",
        fileWithGOAnnotationChromosomeColumn=1,
        fileWithGOAnnotationStartColumn=2,
        fileWithGOAnnotationEndColumn=3,
        columnsWithGOAnnotationComaSeparated="4,5,6,7,8,9",
        numberOfFirstLinesToIgnoreInGOAnnotationFile=1,
        outputDirectory="/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02",
        outputSuffix="_GO.txt"
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl.py

CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames

Creates Allele Frequency annotation from a list of VCFFilenames for tabular files that contain at least a chromosome and a position column. It requires the xapian ( http://xapian.org) python package and vcftools ( http://vcftools.sourceforge.net/)

Parameters

  • pathToVCFTools: Path where vcftools is installed
  • listOfVCFFiles: python list of VCF files where the annotation will come from
  • listOfFilenamesToBeAnnotated
  • outputPreffix
  • xapianIndexDirectory. If None it will create a system temporary directory.

Example

Code highlighting:

import wikipl
from wikipl import CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames

VCFFilenames_Example = [
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/CEU.exon.2010_03.genotypes.vcf",
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/YRI.exon.2010_03.genotypes.vcf"
]

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames(
        pathToVCFTools = "/Users/alexandroskanterakis/Tools/vcftools/cpp/vcftools",
        listOfVCFFiles=VCFFilenames_Example,
        listOfFilenamesToBeAnnotated=filesToBeAnnotated,
        outputPreffix="_AlleleFrequencyExample.txt",
        xapianIndexDirectory="/Users/alexandroskanterakis/Data/CD_china/Intersection/xapianDB_Example"
        )

Source code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames.py

MergeHorizontallyFilesAccordingToCommonColumns?

Merge horizontally files according to common columns

Parameters

  • listOfFilenamesToBeAnnotated: Python list of filenames to be annotated.
  • listOfColumnsFromFileToBeAnnotated: Python list of columns that we want to keep from the files to be annotated
  • listOfListsOfInputFilenames: Python list of python list of input filenames
  • listOfAnnotationFileColumns
  • listOfFirstLinesToIgnore: Python list of first lines to ignore from each annotation file
  • listOfOutputFilenames

Example

Code highlighting:

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

filesAnnotation1 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_polyphen.txt"
]

filesAnnotation2 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_GO.txt"
]

filesAnnotation3 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_AlleleFrequency.txt"
]

filesOutput123Annotated = [

"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_Annotated.txt"

]

MergeHorizontallyFilesAccordingToCommonColumns(
        listOfFilenamesToBeAnnotated=filesToBeAnnotated,
#       listOfColumnsFromFileToBeAnnotated=range(39),
        listOfColumnsFromFileToBeAnnotated = [2,3],
        listOfListsOfInputFilenames=[filesAnnotation1,filesAnnotation2,filesAnnotation3],
        listOfAnnotationFileColumns=[[2],[2,3,4,5,6,7],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]],
        listOfOutputFilenames=filesOutput123Annotated
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/MergeHorizontallyFilesAccordingToCommonColumns.py

ANNOVAR Annotation pipeline

  • About ANNOVAR:  http://www.openbioinformatics.org/annovar/
  • Download ANNOVAR  http://www.openbioinformatics.org/annovar/download/annovar.latest.tar.gz more information:  http://www.openbioinformatics.org/annovar/annovar_download.html
  • ANNOVAR is already installed and configured in gbicdev: /data/home/data/alex/ANNOVAR/
  • Download (for both hg18 and hg18 release) annotation files (most of them from UCSC):
  • To Annotate a VCF (version 4.0) file, use: http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/ANNOVAR.py This is a wrapper for the ANNOVAR tool.
  • Usage:
  • Example: python ANNOVAR.py --pathToANNOVAR /data/home/akanterakis/tools/ANNOVAR/annovar --VCFFilename /data/home/data/pdeelen/Celiac40ExomsProject/SequenceData/sequence0605_41.index_hg18.snps.filtered.vcf --outputFilename /data/home/data/alex/ANNOVAR/annotated/output.txt --outputDirectory /data/home/data/alex/ANNOVAR/annotated/ --buildver hg18 --annotationDirectory /data/home/data/alex/ANNOVAR/humandb_hg18/ --geneBasedAnnotations refgene,knowngene,ensgene --regionBasedAnnotations band,segdup,dgv,gwascatalog --filterBasedAnnotations snp130 --customAnnotations kantale
  • Options:
    • --pathToANNOVAR: The path to the installed ANNOVAR tool
    • --VCFFilename: The path to the VCF file to be annotated
    • --outputFilename: The output annotated file
    • --outputDirectory: output directory
    • --buildver: Could be either hg18 or hg18
    • --annotationDirectory: The directory where the annotation files are
    • --geneBasedAnnotations: Gene Based Annotations according to ANNOVAR (coma separated no spaces)
    • --regionBasedAnnotations: Region Based Annotations according to ANNOVAR (coma separated no spaces)
    • --filterBasedAnnotations: Filter Based Annotations according to ANNOVAR (coma separated no spaces)
    • --customAnnotations: Custom annotations. These should be GFF3 files  http://www.sequenceontology.org/gff3.shtml. The list of files should be comma separated no spaces
    • --dummy: If set True, the script will not do anything just print the script commands. Default value: False
    • --verbose True/False? (Default: True)
    • There are additional options for GeneOntology? Annoatation and AlleleFrequency? from VCF files under development.
  • Extra annotation files have been generated in GFF3 format to be used with the customAnnotation parameter:
    • /data/home/data/alex/ANNOVAR/annotated/sequence0605_41.index_hg18.snps.filtered.vcf.sequence0605_41.index_hg18.snps.filtered.vcf.AlleleFrequency?_AlleleFrequecy.hg18_gff3
    • /data/home/data/alex/ANNOVAR/annotated/sequence0605_41.index_hg18.snps.filtered.vcf.sequence0605_41.index_hg18.snps.filtered.vcf.GO_GeneOntology.hg18_gff3

Related work

Links and resources

Attachments