- Streptococcus pneumoniae
- Genotyping and SNP detection of Streptococcus pneumoniae
- S. pneumoniae Sequence Download Tool
- Francisella tularensis
- Genotyping and SNP detection of Francisella tularensis
- F. tularensis Comparative Analysis Tool
- Haemophilus influenzae
- Genome Sequencing and Species Microarray for H. influenzae
- High Throughput Genotyping using Human DNA Isolated from Stored Serum Specimens
- Bioinformatic SNP Filter Scripts
- Poster Presentations
Whole-genome genotyping and SNP detection of Francisella tularensis isolates by resequencing arrays
Project Abstract
The NIAID sponsored (NIAID contract N01-AI-15447) Pathogen Functional Genomics Resource Center (PFGRC) has evaluated Affymetrix GeneChip® Resequencing Oligonucleotide Array technology to detect genotypic variations in Francisella tularensis isolates. The primary goal is to establish expertise in the technology and methodologies towards development of a novel genotyping platform enabling detection of single nucleotide polymorphisms (SNPs) from whole genomes of microorganisms. The whole-genome scale (re)sequence and SNP information from multiple strains of an infectious agent will be shared with the scientific community, enabling advances in both basic research and translational applications for this select A agent.
The F. tularensis GeneChip® set was designed on the basis of the DNA sequence of strains LVS (GenBank Accession: AM 233362) and SCHU S4 (GenBank Accession: AJ 749949) available at http://www.weizhongli-lab.org. Sequences of plasmids, pOM1 (GenBank Accession: NC 002109) and pFNL10 (GenBank Accession: NC 004952) were obtained from the NCBI database (http://www.ncbi.nlm.nih.gov/). A merged sequence was constructed based on these genomic and plasmid sequences for the purposes of GeneChip® design. The F. tularensis LVS and SCHU S4 genomes are 1,895,998 and 1,892,819 bp respectively. An in silico analysis was performed to identify unique sequences from SCHU S4 (ranging from 1 bp to 11086 bp) that were appended to the LVS sequence along with plasmid pOM1 sequence and unique regions from pFNL10. A total of 179,193 bp (9.22%) of repetitive sequence were excluded from the design, resulting in 1,764,558 queryable bases (91% of the F. tularensis genome) for resequencing by hybridization. This merged sequence was tiled onto a set of 6 CustomSeq 300K GeneChip® arrays.
The use of genomiphied whole-genome samples, rather than PCR-amplified fragments, simplifies the experimental protocol, and also avoids the PCR failures that would inevitably occur with some clinical samples of unknown sequence composition. However, the higher complexity of the whole-genome sample also increases the frequency of certain artifacts. The bioinformatic filters that we have developed have proven successful in identifying and eliminating he majority of these artifacts.
The details of the custom whole-genome resequencing array set design, bioinformatic filter development and validation for improved base-call accuracy and polymorphism detection are published (Nucleic Acids Research 2007; doi: 10.1093/nar/gkm918).
The data presented in the following pages shows the SNPs that were detected in each of 40 distinct whole genome samples after hybridization with the resequencing arrays. All samples were done in duplicate. A set of bioinformatic filters were applied to the results from each experiment (see here for more information), and the results from two experiments were combined, eliminating those SNPs that were not present in both results after filtration.
The "SNP Report" provides:
- the nucleotide and its position in the reference strain and in the target fragment
- the annotation, context of the ORF, and amino acid sequence when in a coding region
This information may be sorted and organized by nucleotide position or ORF. A separate report provides, for each selected SNP position, an alignment between the reference sequence and the chosen target sequences.
Users of this comparative sequence information may begin to compile a meaningful set of known SNPs that may be applied to their own research projects.
The data acquired from these experiments are also available as a set of multi-FASTA files and related files which may be downloaded as a compressed archive from our FTP site, here. Please see the README file included with the download for more information about the contents.
Data and Resources
Files
File Name |
Description |
F. tularensis data download |
|
Whole Genome Resequencing and SNP Genotyping of Category A Biodefense Agent Francisella tularensis Poster |
Tools
Link |
Description |
Tool to show SNPs detected by resequencing array |