First Individual Diploid Human Genome Published By Researchers at J. Craig Venter Institute
Sequence Reveals that Human to Human Variation is Substantially Greater than Earlier Estimates
Independent sequence and assembly of the six billion base pairs from the genome of one person ushers in the era of individualized genome-based medicine
ROCKVILLE, MD — September 3, 2007 — Researchers at the J. Craig Venter Institute (JCVI), along with collaborators at The Hospital for Sick Children (Sick Kids) in Toronto and the University of California, San Diego (UCSD), have published a genome sequence of an individual, J. Craig Venter, Ph.D., that covers both of his chromosome pairs (or diploid genome), one set being inherited from each of his parents.
Two other versions of the human genome currently exist — one published in 2001 by Dr. Venter and colleagues at Celera Genomics, and another at the same time by a consortium of government and foundation-funded researchers. These genomes were not of any single individual, but rather were a mosaic of DNA sequences from various donors. In the case of Celera it was a consensus assembly from five individuals, while the publicly-funded version was based on patching together sequences from over 100 anonymous human sources. Both versions greatly underestimated human genetic diversity.
This new genome (called "HuRef") represents the first time a true diploid genome from one individual — Dr. Venter, has been published. The research is available in the open access public journal, PLoS Biology.
Researchers at the JCVI have been sequencing and analyzing this version of Dr. Venter's genome since 2003. Building on reanalyzed data from Dr. Venter's genome that constituted 60% of the previously published Celera genome, the team had the goal of constructing a definitive reference human genome based on one individual. Using whole genome shotgun sequencing and highly accurate DNA sequencing using Sanger-based chemistry, the team produced additional data which constitutes the final 32 million sequence reads.
From the combined data of more than 20 billion base pairs of DNA, the team was able to assemble the majority of Dr. Venter's genome. Since this genome assembly uniquely catalogues the contributions of each of the parental chromosomes, for the first time the amount of variation existing between the two could be determined. Surprisingly, a higher than expected amount of genetic variation was found to exist between the two human chromosomes.
"Each time we peer deeper into the human genome we uncover more valuable insight into our intricate biology," said Dr. Venter. "With this publication we have shown that human to human variation is five to seven-fold greater than earlier estimates proving that we are in fact more unique at the individual genetic level than we thought." He added, "It is clear however that we are still at the earliest stages of discovery about ourselves and only with additional sequencing of more individual genomes will we garner a full understanding of how our genes influence our lives."
Within the human genome there are several different kinds of DNA variants. The most studied type is single nucleotide polymorphisms or SNPs, which are thought to be the essential variants implicated in human traits and disease susceptibility. A total of 4.1 million variants covering 12.3 million base pairs of DNA were uncovered in this analysis of Dr. Venter's genome. Of the 4.1 million variations between chromosome sets, 3.2 million were SNPs. This is a typical number expected to be found in any other human genome, but there were at least 1.2 million variants that had not been described before. Surprisingly, nearly one million were different kinds of variants including: insertion/deletions ("indels"), copy number variants, block substitutions and segmental duplications.
While the SNP events outnumbered the non-SNP variants, the latter class involved a larger portion (74%) of the variable component of Dr. Venter's genome. This data suggests that human-to-human variation is much greater than the 0.1% difference found in earlier genome sequencing projects. The new estimate based on this data is that genomes between individuals have at least 0.5% total genetic variation (or are 99.5% similar) The researchers suggest that much more research needs to be done on these non-SNP variants to better understand their role in individual genomics.
According to Samuel Levy, Ph.D., lead author and senior scientist at JCVI, "The ability to use unbiased, high throughput, sequencing methods coupled with advance computational analytic methods, enables us to characterize more comprehensively the wide variety of individual genetic variation. This offers us an unprecedented opportunity to study the prevalence and impact of these DNA variants on traits and diseases in human populations."
Another important feature that is made possible by having an individual, diploid genome is the ability to generate more informed haplotype assemblies. Haplotypes are groups of linked variations along the chromosomes. Other studies have generated many common haplotypes, however these are based on averages of large populations rather than individuals. Individual haplotypes enable scientists to study rare or 'private' variants that might explain and help predict traits and diseases in that particular person — allowing an individualized approach in genomic applications.
In the HuRef analysis, the team used the heterozygous portion of the 4.1 million variant set and new algorithms to build haplotype assemblies. These haplotype assemblies were typically an order of magnitude larger than what can be achieved by genotyping a single individual, with over half the genome contained in segments greater than 200,000 base pairs in length. The JCVI researchers expect this number to improve significantly as additional sequence coverage is added to HuRef using a variety of new sequencing technologies.
"In the future it will be possible to know the parental origin of DNA that is contributing, either alone or in combination, to various traits or disease," said co-author Stephen Scherer, Ph.D., senior scientist in Genetics and Genomic Biology at SickKids and professor of Molecular and Medical Genetics at the University of Toronto. "This study discovered that in an individual genome upwards of 44% of genes were variable in sequence, a number that geneticists have wondered about for 50 years. With this type of knowledge now in hand, the stage is set for an era of personalized medicine where genome sequence information becomes a critical reference to assist with health-related decisions", concluded Scherer.
Background
The publication of Dr. Venter's genome represents the first publication of an individual's genome and the first human genome publication since the first sequence and analysis of the human genome published in Science in 2001 by Dr. Venter and colleagues at Celera Genomics. The publicly funded genome project also published their version of the human genome at the same time in the journal Nature. At Celera there were five individuals whose genomes were used for that consensus human genome assembly. One of those individuals was Dr. Venter whose DNA constituted the majority of the DNA for that genome. The publicly funded genome project used DNA from a variety of individuals and is a composite version.
The new HuRef version of the human genome is the sequence and assembly of one individual in which the person's two sets of chromosomes (one inherited from the mother and one set from the father) are represented. It is this kind of genome sequencing and analysis that will usher in the true era of individualized medicine.
Dr. Venter and the team at JCVI have long been proponents of finding new and improved methods for sequencing genomes since it is only through cost-effective and accurate sequencing methods that millions of human genomes can be sequenced. In September 2003, the JCVI announced a $500,000 prize for advances leading to the sequencing of one genome for $1,000 or less. The JCVI prize was eventually joined with the $10 million Archon X Prize for Genomics.
For the HuRef project, the team at JCVI used a more traditional method of sequencing — whole genome shotgun assembly which is built upon Sanger dideoxy sequencing. Then, Applied Biosystems 3730xl high-throughput DNA sequencing machines were employed since these methods still produce the longest and most accurate lengths of DNA. This project was designed to produce an accurate and more complete version of a single individual's genome rather than producing a fast and potentially less expensive version. From the HuRef genome however the researchers believe that newer methods for sequencing can be used to enable more people to have their genome's sequenced and analyzed. It is clear that the HuRef version is likely the last time that these more traditional methods of sequencing will be employed.
Funding for the research on the new HuRef diploid genome was from the J. Craig Venter Institute.
About the J. Craig Venter Institute
The J. Craig Venter Institute is a not-for-profit research institute dedicated to the advancement of the science of genomics; the understanding of its implications for society; and communication of those results to the scientific community, the public, and policymakers. Founded by J. Craig Venter, Ph.D., the JCVI is home to approximately 500 scientists and staff with expertise in human and evolutionary biology, genetics, bioinformatics/informatics, information technology, high-throughput DNA sequencing, genomic and environmental policy research, and public education in science and science policy. The legacy organizations of the JCVI are: The Institute for Genomic Research (TIGR), The Center for the Advancement of Genomics (TCAG), the Institute for Biological Energy Alternatives (IBEA), the Joint Technology Center (JTC), and the J. Craig Venter Science Foundation. The JCVI is a 501 (c)(3) organization. For additional information, please visit http://www.weizhongli-lab.org.
# # #
Contact
Heather Kowalski
202-294-9206
[email protected]
# # #