wiki:SnpAnnotationPipeline

Version 13 (modified by a.kanterakis, 13 years ago) (diff)

--

PrepareGFFFilesFromBGIForSeattleSeqAnnotation

Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool?. Replace alleles with allele and adds the line: # autoFile testAuto.txt in the top of the file.

Parameters

  • GFFFilename : Input filename
  • outputGFFFilename: Output filename

Example

Code highlighting:

PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff")

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/PrepareGFFFilesFromBGIForSeattleSeqAnnotation.py

AnnotateVarianListFileViaSeattleSeqAnnotation

Annotate Files with Variants through Seattle Seq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq? Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files:

  • httpunit.jar
  • js-1.6R5.jar
  • junit-3.8.1.jar
  • nekohtml-0.9.5.jar
  • xercesImpl-2.6.1.jar

Parameters

For a complete list of parameters please check the Annotation website and the example below

Example

Code highlighting:

AnnotateVarianListFileViaSeattleSeqAnnotation(
        inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff,
        outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt,
        eMail=alexandros.kanterakis@gmail.com,
        fileFormat=GFF,
        geneData=CCDS2008,
        allelesMaq=true,
        allelesDBSNP=true,
        scorePhastCons=true,
        scorePhastCons=true,
        consScoreGERP=true,
        chimpAllele=true,
        CNV=true,
        geneList=true,
        HapMapFreqType=HapMapFreqMinor,
        geneList=true,
        hasGenotypes=true,
        dbSNPValidation=true,
        repeats=true,
        geneList=true,
        proteinSequence=true,
        polyPhen=true,
        clinicalAssociation=true
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation.py

AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs

This method takes a list of files that have been generated from SeattleSeq Annotation tool and a list of tabular files that contain Chromosome and position columns. It adds the polyphen annotation that is contained in the former list of files to the later.

Parameters

  • listOfSeattleSeqAnnotationOutputs: list of SeattleSeq? Annotation files that we want to take the polyphen annotation from
  • listOfFileToBeAnnotated: List of files with chromosome and position information.
  • chromosomeColumn: The Chromosome column of the files to be annotated
  • positionColumn: The position column of the files to be annotated
  • outputDir: The directory where the generated files will be stored
  • outputSuffix: The suffix of the output files.

Example

Code highlighting:

listOfSeattleSeqAnnotationOutputs = [
"/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt"
]

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]


AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs(
        listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs,
        listOfFileToBeAnnotated=filesToBeAnnotated,
        chromosomeColumn=2,
        positionColumn=3,
        outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection",
        outputSuffix="_poluphenExample.txt",
        numberOfFirstLinesToIgnore=1
        )

Source code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs.py

Pipeline Elements

This section will describe all parts of our annotation pipeline, the scripts have, and the features it will return.

scriptfeaturedescriptionsource
1KGannotation.pyalleleFreqallele freq in 1KG1KG
TODO

End of Pipeline

Related work

Attachments (2)

Download all attachments as: .zip