File Examples

Dr. Jim Watson's 220K indels across the genome [NCBI 36] [NCBI 37] (Space-based coordinates) [Ref]
Filters down to ~400 coding

chr7 SNPs from the Asian genome [ NCBI 36] [NCBI 37] (Residue-based coordinates) [Ref]
Filters down to ~1K coding


Format Example 1: RESIDUE BASED COORDINATE SYSTEM (comma separated, NCBI 36 shown below)
3,81780820,-1,T/C
2,43881517,1,A/T,#User Comment
2,43857514,1,T/C
6,88375602,1,G/A,#User Comment
22,29307353,-1,T/A
10,115912482,-1,C/T
10,115900918,-1,G/T
16,69875502,-1,G/T
16,69876078,-1,T/C
16,69877147,-1,G/A
22,49000825,-1,T/A
22,49000551,-1,T/C
22,49006739,-1,A/G
11,17476318,-1,C/G
4,124033758,-1,C/T
3,185041096,-1,C/G
17,8101874,1,C/T


Format Example 2: SPACE BASED COORDINATE SYSTEM (comma separated)
3,81780819,81780820,-1,T/C
2,43881516,43881517,1,A/T,#User Comment
2,43857513,43857514,1,T/C
6,88375601,88375602,1,G/A,#User Comment
22,29307352,29307353,-1,T/A
10,115912481,115912482,-1,C/T
10,115900917,115900918,-1,G/T
16,69875501,69875502,-1,G/T
16,69876077,69876078,-1,T/C
16,69877146,69877147,-1,G/A
22,49000824,49000825,-1,T/A
22,49000550,49000551,-1,T/C
22,49006738,49006739,-1,A/G
11,17476317,17476318,-1,C/G
4,124033757,124033758,-1,C/T
3,185041095,185041096,-1,C/G
17,8101873,8101874,1,C/T


Format Description
SIFT Genome input format: [comma separated: chromosome,coordinate,orientation,alleles,user comment(optional)]
Please do not use spaces except in the user comments field

Coordinate System:
SIFT accepts both reidue-based and a space-based coordinates for single nucleotide variants.
If there is only one column of coordinates, as shown in Example 1 above, SIFT assumes the coordinate
system is residue-based, if there are two columns, as shown in Example 2 above, SIFT assumes the
coordinate system is space-based.

The space-based coordinate system counts the spaces before and after bases rather than the bases themselves.
Zero always refers to the space before the first base.

The sequence 'ACGT' has coordinates (0,4) and its subsequence 'CG' has coordinates (1,3) as shown in Example 3 below.
The difference between the start and end coordinates gives the sequence length. Misinterpretation of these
coordinates can easily lead to 'off-by-one'. errors. Space-based coordinates become necessary when describing
insertions/deletions and genomic rearrangements.

Example 3:

0 A 1 C 2 G 3 T 4

In a residue based system as described in Example 4 below, each base is assigned a coordinate base on its
absolute position, starting from 1. The sequence 'ACGT' has coordinates (1,4) and its subsequence 'CG' has
coordinates (2,3).

Example 4:
A C G T
1 2 3 4

Orientation:
Use 1 for positve strand and -1 for negative strand. If orientation is not known, use 1 as default.

Alleles:
New format Aug 2011
Use 'base1/base2' where 'base1' is the reference allele and 'base2 is the variant allele.
To obtain prediction for the reference allele, use 'base1/base1' where base1 is the reference allele.