Variant Classifier

Running the VariantClassifier

Running the VariantClassifier is very straight forward. The more difficult part is generating an annotation file that the classifier will read in. Fortunately, you only need to do this once per organism or region of interest. If you are planning to classify variants on an Ensembl annotated genome, then you need to make sure you have downloaded and properly installed the Ensembl Perl API, see step 3 of the Download and Installation page.

If you are planning to create your own annotation file, then you should look at our human and corona virus examples (in the Examples/human/annotation_extraction/bcl2.coding_info and Examples/corona_virus/SARS-WT-annotated.coding_info, respectively), and then read the Coding Info File Format manual.

Example:

Let us assume that you already have an annotation file generated. To test run the VariantClassifier on its own you can just go (cd) into the relevant Examples directory, which was created when you unpacked the VariantClassifier source from Sourceforge. If you are in the SNPClassifier directory and you want to work with the human example,

cd Examples/human/classify_variants

You would then run the command:

../../../Classify_SNPs.pl \
        -s input_snps \
        -c ../annotation_extraction/bcl2.coding_info \
        -n ../annotation_extraction/bcl2.fasta \
        -o output

A description of the input parameters is available when you invoke the Classify_SNPs.pl script without any specified paramaters.

Usage:

../../../Classify_SNPs.pl
        -s <Query SNPs Filename>
        -c <coding info file>
        -n <reference nucleotide sequence FASTA>
        -o <output file root>

The input that should vary from run to run is in the input_snps file. The output is then saved in the file named, output. The coding_info and fasta file need to be created manually or with the Extract Coding Info script.

Resources

Download VariantClassifier from SourceForge

Publication: VariantClassifier: A hierarchical variant classifier for annotated genomes