Home  > Comparative Genomics  > Targeted Genome Sequencing and the Development of the Species Microarray…

Targeted Genome Sequencing and the Development of the Species Microarray for Haemophilus influenzae

Project Abstract

The PFGRC has developed a cost effective alternative to complete genome sequencing in order to study the genetic differences between closely related species and/or strains. The comparative genomics approach combines Gene Discovery (GD) and Comparative Genomic Hybridization (CGH) techniques, resulting in the design and production of species microarrays that represent the diversity of a species beyond just the sequenced reference strain(s) used in the initial microarray design. These species arrays may then be used to interrogate hundreds of closely related strains in order to further unravel their evolutionary relationships.

Haemophilus influenzae (Hi) is a species of gram negative rod-shaped bacteria that are exclusively adapted to a symbiotic life in the upper respiratory tract of humans. Despite its commensal life-style, H. influenzae remains one of the five most frequent causes of death among pathogenic microorganisms (Peltola, 2000). Prior to the introduction of wide-spread vaccination in many countries, H. influenzae strains expressing a serotype b capsule were among the three leading causes of bacterial meningitis worldwide and remain so in many Asian and developing countries. It is estimated that at least 3 million cases of serious disease and 400,000-700,000 deaths occur in young children worldwide per year (Peltola, 2000). Invasive infections are primarily associated with strains representing one (Division I) of two distinct lineages of serotype b, which shows a characteristic partial duplication of the capsular biosynthesis (cap) locus (Kroll et al., 1990). Division II of H. influenzae serotype b and strains expressing capsular genes representing any of the other five serotypes are only occasionally isolated from patients with infections. The non-encapsulated (“non-typeable”) H. influenzae are regular inhabitants of the human pharynx, particularly during childhood, and are frequent causes of mucosal infections at sites contiguous with the upper respiratory tract, e.g otitis media, sinusitis, lower respi¬ratory tract infections, and conjunctivitis (Kilian, 2003).

Occasionally, non-encapsulated H. influenzae strains cause invasive infections in the absence of apparent predisposing conditions in the patient. The most striking example is the disease described as Brazilian purpuric fever (BPF), which is a fulminant life-threatening pediatric infection caused by particular clones of H. influenzae (“H. influenzae biogroup aegyptius”). BPF was first recognized during 1984 in Brazil, when 10 children in a town of 20,000 persons died of an acute febrile illness associated with hemorrhagic skin lesions, septicemia, hypotensive shock, vascular collapse, and death, usually within 48 hours of onset. The disease is characteristically preceded by purulent conjunctivitis that is re¬solved before the onset of fever. Several subsequent outbreaks oc¬curred in Brazil in different communities, and cases with clinical signs indistinguishable from those of BPF have been described in Australia and the U.S.A. These infections were caused by strains that are distinct from Brazilian isolates and the result of independent evolutionary events (Brenner et al., 1988; Kilian et al., 2002; Musser & Selander, 1990; Smoot et al., 2002).

We conducted comparative genomics on fourteen H. influenzae strains, four H. aegyptius, two H. influenzae biogroup aegyptius and one H. haemolyticus utilizing DNA microarray technology through two different approaches. In the first, we employed a 70-mer based microarray representing all predicted ORFs of H. influenzae KW20/Rd to conduct CGH. In the second we developed a novel strategy, aimed at screening 18,000 plasmids derived from an HK1212 genomic library to identify novel sequences that distinguishes the BPF isolates from the KW20/Rd. The analysis of genomic sequences uniquely encoded in hypervirulent strains represents an important first step in gaining insights as to the role horizontal gene acquisition plays in the evolution of novel virulence properties. By examining the novel genomic segments (Riley, 1993), we conclude that gene acquisition events played a significant role in the emergence of the BPF strain and that these events were perhaps numerous and sequential. Finally, based on sequence information obtained from this libraries and from other publicly available resources, a comprehensive species microarray was developed for H. influenzae.

Additional Citations

  • Kilian, M. (2003). Haemophilus. In Manual of Clinial Microbiology, pp. 623-635. Edited by P. R. Murray, Baron, E.J., Jorgensen, J.H., Pfaller, M.A., Yolken, H.R. Washinton D.C: American Society of Microbiology.

Sub Projects

Genomic Characterization of an Emerging Haemophilus influenzae Isolate Causing a Brazilian Purpuric Fever -like Infection

Project Methods

Haemophilus influenzae lives symbiotically in the upper respiratory tract of humans. In addition to the capsulated type b strains, non-typable (NTHi) strains are known to cause a number of distinct and significant infections (Kilian, 2007). Members of the biogroup aegyptius cause Brazilian purpuric fever (BPF), an important pediatric disease with a high mortality rate. Here we report a procedure we refer to as Gene Discovery to characterize genomic sequences unique to strain HK1212 as a means to gain insight into the emergence of this highly virulent subtype. The vast majority of novel DNA content in the HK1212 genome shared sequence identity with members of the Pasteurellaceae family, to which the genus Haemophilus belongs. Analysis of the HK1212-specific genomic content revealed a myriad putative virulence factors, dramatically enriched in invasins and cytadhesins, comprising 20% of the novel features identified. Overall, these findings suggest that the emergence of invasive, non-encapsulated clones of H. influenzae was a complex process characterized by multiple sequential gene acquisition and gene loss events.

We initially utilized a single-genome-based array to perform comparative genomic hybridization (CGH) studies with a diverse set of 20 clinical strains of Hi, Haemophilus aegyptius (Hae), H. influenzae biogroup aegyptius (Hibae), Haemophilus haemolyticus, as well as the sequenced reference strain Hi KW20/Rd. The majority of these strains were selected to represent major lineages within this taxon on the basis of a previously reported population genetic study that used both Multi Locus Enzyme Electrophoresis (MLEE) and MLST (Kilian et al., 2002), Figure 1). CGH allowed us to discover meaningful phylogenomic associations, and assess the extent of species diversity by identifying variable and common genomic features.

Strain HK1212 was isolated in central Australia in 1986 from a child with the characteristic symptomology of BPF (McIntyre et al., 1987). From our phylogenomic analysis (Figure 2), HK1212 appeared to represent a lineage that was the closest to the root of the clade comprised of isolates associated with BPF or conjunctivitis. Using this conservative approach, we hypothesized that the HK1212 genome contains additional or unique genes that may have contributed to its virulence, including its ability to evade innate immune factors, despite lacking a capsule.

We have developed a strategy we refer to as Gene Discovery (GD) that, like subtractive hybridization, enables the identification and rapid characterization of strain-specific sequences in microbial genomes. The method (Figure 3) uses DNA microarrays to discriminate between genomic fragments unique to HK1212 and those common to the HK1212 and KW20/Rd genomes. All sequence reads were assembled using TIGR assembler. The resulting contigs and singletons were subjected to annotation to find potential open reading frames (ORFs), followed by manual curation. Table 1 summarizes important characteristics of all novel HK1212 features discovered.

A third of the annotated features from HK1212 were identified to be unique relative to the H. influenzae KW20/Rd genome. Almost 75 % of these novel features shared no significant homology to sequences derived from any known Haemophilus species. The vast majority of the novel DNA content in the HK1212 genome shared sequence homology to members of the Pasteurellaceae family, to which the genus Haemophilus belongs, underscoring the role of horizontal gene transfer as a driving force for the species’ genome evolution (Figure 4). GD and Comparative Genome Hybridization results based on a single (reference) genome-based microarray indicated that the genes absent in HK1212, and consequently, the cellular role categories their products were responsible for, were not compensated for by other genes. Excluding unknowns, ORFs predicted to be involved in transport, and energy metabolism constituted the majority of lost functions, whereas ORFs encoding mobile and extrachromosomal elements, proteins for cell envelope, DNA metabolism and cellular process made up the most of the HK1212 novel genomic content. (Figure 5). Putative virulence factors, dramatically enriched in invasins and cytadhesins, constituted 20% of the novel features identified in HK1212.

Extensive research on H. influenzae pathogenesis has resulted in the identification and characterization of many virulence factors. However, the screening of clinical isolates has indicated that their distribution is not universal (High, 2001). With the exception of the cap genes, we identified nearly the entire panoply of previously described H. influenzae virulence factors in the genome of HK1212 (Table 2). This is quite different from hitherto known patterns of virulence gene content discovered within the genomes of various members of H. influenzae as reviewed by High (High, 2001). Although the genome content revealed by our study is a snapshot of HK1212 evolution, it is clear that a gene complement supporting high invasiveness in a capsule independent manner has emerged. Overall these findings suggest that the emergence of BPF clones such as HK1212 is a complex process characterized by multiple sequential gene acquisition and gene loss events.

Taken together, results of this study allowed us to recognize the sources of diversity, understand the relationships between H. influenzae group members, and further elucidate aspects of their genome evolution. GD and CGH approach enabled us to recognize novel clades within the species, guide a genome sequencing project by identifying the appropriate target (HK1212 as a representative of an emerging disease), and discover genomic markers that can be used for diagnostic purposes. Sequence information obtained from the partial sequencing project of Hk1212 was used to expand the content of the H. influenzae species microarray.

Additional Citations

  • High, N. J. (2001). Haemophilus influenzae. In Medical Molecular Microbiolgy, pp. 1967-1988: Academic Press.
  • Kilian, M. (2007). Haemophilus. In Manual of Clinial Microbiology, pp. 636-648. Edited by P. R. Murray, Baron, E.J., Jorgensen, J.H., Landry, M.L., Pfaller, M.A. Washinton D.C: American Society of Microbiology.

Data and Resources

Files

File Name

Description

Table 1

Summary of the HK1212 features identified through Gene Discovery (GD)

Sequence Files

Strain Name

Nucleotide Sequence

Protein Sequence

HK1212

View / Download

View / Download

Development and Use of a Comprehensive Species Microarray for Phylogenomic Analysis of H. influenzae Strains

Project Methods

Haemophilus influenzae(Hi) is a gram-negative rod shaped bacterium that lives symbiotically in the upper respiratory tract of humans. Capsulated Hi, as well as non-capsulated strains are known to cause a number of significant infections. Recognizing and understanding the links between the phenotypic traits and the genetic background has paramount epidemiological and clinical importance. To date, a panoply of microbiological and molecular biology tools have been developed and utilized by researchers aimed at identifying the evolutionary links among strains and isolates. Comparative genomic hybridization (CGH) has been shown to be a useful tool for screening strains for their genetic content. However, there is a major limitation when CGH is conducted on a microarray based on a single reference genome. CGH results report which genes are present or absent relative to the genome. Hence the information about novel genetic content that the query strain possesses remains obscure. We report here the construction of the first Hi pan-genome microarray representing ca. 4600 features by 70-mers. In addition to those from the Rd strain, new features originate from the unfinished genome sequences present in NCBI database and from our novel gene discovery project efforts using strain HK1212. Genomes of 20 strains belonging to different phylogenetic lineages were screened for their gene loss and gain utilizing the species microarray. The results obtained by employing the multistrain species microarray provide comprehensive information about the genomic content of uncharacterized strains. The trees generated by CGH, in general, do not reproduce the phylogeny of a species in terms of vertical evolution, but instead represents the overall relatedness of genomes to one another and provide an assessment about the species genome evolution.

Twenty query strains (denoted HKxxxx) were investigated in this study, with each query strain hybridized against the reference strain, KW20. Two dye-swap experiments were performed, for a total of four hybridizations per query strain. Each 70mer oligo spotted on the H. influenzae species microarray is replicated six times. Positive controls on the array consist of oligos designed from the sequenced reference genome, KW20, and negative controls on the array consist of oligos designed from the thale cress plant, Arabidopsis thaliana.

Data and Resources

Microarrays

H. influenzae vt1

Reference Strain

GEO Submission

MeV Annotation (.ann)

Rd KW20

GPL5328

Download

CGH Microarray Data

MutliExperiment Viewer (MeV) Data

File Name

Description

H_influenzae_Readme.txt

README file for H. influenzae data set

Data Description

Data Link

Raw Data - Reference in Cy3

Download

Raw Data - Reference in Cy5

Download

Raw Data - Self-Self hybridization

Download

Normalized Data - Reference in Cy3

Download

Normalized Data - Reference in Cy5

Download

Normalized Data - Self-Self hybridization

Download


GEO Experiment Series Submissions

GEO Series

GEO Experiment Links

GSE8300