Posts tagged visualization

Understanding Complex Data through Better Visualization

Recently, researchers at JCVI reported on the Rhizoctonia solani mitochondrial genome which was the largest fungal mitochondrion to be sequenced to date. We showed that its unusually large size was probably due to the expansion of multiple genetic elements that populated the genome in somewhat of a ‘parasitic’ relationship. The visualization was meant to impress the number and variety of these repetitive genetic elements, and was selected in a commentary in FEMS Microbiology Letters as an example of how to summarize molecular data in order to obtain an overall view of the results.

The outermost circle represents the chromosome and repetitive elements. Other important features such as genes, endonucleases, exons, RNAseq coverage are represented in the concentric circles respectively. Grey links represent short repeats (< 35bp) found up to 100 times in the genome; colored links show the location of repeats and follow the coloration in Track 1.

The outermost circle represents the chromosome and repetitive elements. Other important features such as genes, endonucleases, exons, RNAseq coverage are represented in the concentric circles respectively. Grey links represent short repeats (< 35bp) found up to 100 times in the genome; colored links show the location of repeats and follow the coloration in Track 1.

Virtual Comparative Metagenomics

We have created an open virtualization format (OVF) package of JCVI’s Metagenomics Reports (METAREP)- a high performance comparative metagenomics analysis tool. The software runs on a web server, retrieves data from two different database systems and uses R for statistical analysis. The new OVF package bundles all these 3rd party tools and is configured to run out of the box in a virtual machine.

Screenshot of the virtual box appliance import wizard. The wizard allows you to specify the CPU and memory usage of the virtual machine on which METAREP will run on.

To run a virtual version of METAREP on your machine, follow these steps

  1. download the METAREP OVF package from our ftp site [download] .
  2. unzipp the OVF package
  3. download and install Oracle’s Virtual Box, a OVF compatible virtualization software [download]
  4. Start Virtual Box
  5. Click File/Import Appliance and select the OVF file.
  6. Adjust RAM/CPU usage using the Appliance Import Wizard (see image)
  7. Start VM
  8. Double-Click on the METAREP firefox link on the VM desktop
  9. Log into METAREP with username=admin and password=admin

This virtual machine appliance is the first step in developing a fully cloud-enabled analysis platform where users can easily launch the application wherever is most convenient: on their personal desktop or in the cloud where they can scale-out the appliance to suite their needs.

Future virtual machine images will be certified to run on other virtualization software platforms. Stay tuned.

If you like to learn more about METAREP and talk to the developers, join us at Lucene Revolution Conference in Boston (October 7-8 2010). We will present a lightning talk about METAREP the first day of the conference 5pm (see agenda).

Links:

JCVI’s METAREP Instance

METAREP Flyer

METAREP Manual

METAREP Source Code

Advance Access JCVI Metagenomics Reports Application Note

A significant JCVI informatics development is JCVI Metagenomics Reports, an open source Web 2.0 application designed to help scientists analyze and compare annotated metagenomics data sets. Users can download the application to upload and analyze their own metagenomics datasets.

METAREP has just been published in Bioinformatics (08/26/2010) as an open access article. The publication is currently accessible under the Bioinformatics Advance Access model. The PDF version can be downloaded at

http://bioinformatics.oxfordjournals.org/cgi/reprint/btq455v1.pdf

Supplementary information includes the METAREP data model and an overview about its search performance accessible at

http://bioinformatics.oxfordjournals.org/cgi/content/full/btq455/DC1

One of METAREP’s key features that distinguishes it from other metagenomics tools is that it utilizes a high-performance scalable search engine that allows users to analyze and compare extremely large metagenomics datasets, e.g. datasets produced by the Human Microbiome Project.

If you like to learn more about METAREP and talk to the developers, join us at Human Microbiome Research Conference in St. Louis in Missouri (August 31 - September 2, 2010). We will present METAREP the first day of the conference at 10:35am (see agenda).

Contact Us:

We would like to hear from you. If you have questions or feedback or if you wish to contribute to the METAREP open source project please send an email to [email protected]

Links:

JCVI’s METAREP Instance

METAREP Flyer

METAREP Manual

METAREP Source Code

High-performance comparative metagenomics

Are your carrying out large scale metagenomics analyses to identify differences among multiple sample sites? Are you looking for suitable analysis tools?

If you have not yet found the right analysis tool, you may be interested in the latest beta version of JCVI Metagenomics Reports (METAREP) [Test It].

METAREP is a new open source tool developed for high-performance comparative metagenomics .

It provides a suite of web based tools to help scientists view, query, browse, and compare metagenomics annotation data derived from ORFs called on metagenomics reads or assemblies.

Users can either specify fields, or logical combinations of fields, to filter
and refine datasets
. Users can compare multiple datasets at various functional and taxonomic levels, applying statistical tests as well as hierarchical clustering, multidimensional scaling, and heatmaps (see image gallery).

For each of these features, tab delimited files can be exported for downstream analysis. The web site is optimized to be user friendly and fast.

Feature Summary [download Flyer]:

  • Handle extremely large datasets. Uses scalable high-performance Solr/Lucene search engine (we have indexed 300 million annotation entries, but much larger volumes can be handled as shown by Hathi Trust).
  • Compare 20+ datasets at the same time. Use various compare
    options including statistical tests and plot options to visualize
    dataset difference at various taxonomic and functional levels.
  • Apply statistical tests such as METASTATS (White et al.), a modified
    non-parametric t-test to compare two sample populations (e.g.
    metagenomics samples from healthy and diseased individuals).
  • Export publication-ready graphics. Export heatmaps, hierarchical clustering, and multi-dimensional scaling plots in PDF format.
  • Analyze KEGG metabolic pathways. Summaries include enzyme
    highlights on KEGG maps, pathway enzyme distributions, and
    statistics about pathway coverage at various pathway levels.
  • Search using a SQL-like query syntax. Build your query using 14
    different fields that can be combined logically.
  • Drill down into data using METAREP’s NCBI Taxonomy, Gene
    Ontology, Enzyme Classification or KEGG Pathway browser.
    Install your own METAREP version.
  • Flexible central configuration, METAREP and 3rd party code base is completely open source.
  • Cross-link function with phylogeny. Slice your data at various
    taxonomic and/or functional levels. For example, search for all
    bacteria or exclude eukaryotes or search for a certain (GO/EC
    ID)/taxonomic combination.
  • Generic data format. Data types that can be populated include a
    free text functional description, best BLAST hit information, as well
    as GO ID, EC ID, and HMMs.

How to analyze your own data: You can install your own METAREP version to analyze your metagenomics annotation data [download source]. We have written a comprehensive manual that describes the installation process step by step [download manual]. Since METAREP only operates on annotated data, raw sequences need to be annotated first. Supported data types that can be loaded for each sequence include functional descriptions, best BLAST hits fields (E-Value, Percent Identity, NCBI Taxon, Percent Sequence Coverage), GO, EC, and HMM assignments. The installation also contains a set of example annotations that can be imported.

Contact Us:

We would like to hear from you. If you have questions or feedback or if you wish to contribute to the METAREP open source project please send an email to [email protected]

Links:

JCVI’s METAREP Instance

METAREP Flyer

METAREP Manual

METAREP Source Code

New ways to analyze metagenomics data

Are you looking for new tools to analyze your metagenomics data? Are you using MG-RAST, IMG/M or MEGAN for your daily metagenomics work?

JCVI is working on a user friendly alternative that you might be looking for - a new tool kit for metagenomics data visualization and analysis built using the latest web 2.0 technologies.

JCVI’s Metagenomics Reports (METAREP) is a user friendly web interface designed to help scientists browse, compare, view, and query annotation data derived from ORFs called on metagenomics reads. It supports both functional (Gene Ontology, Enzyme Commission Classification) and browsing of taxonomic assignments. When performing a search, users can either specify fields or logical combinations of fields to flexibly filter datasets on the fly. METAREP provides lists and pie charts of top functional and taxonomic categories for browse and search results. Tools are being developed that focus on the comparative analysis of multiple datasets. The system is optimized to be user friendly and fast .

Currently, an alpha version of METAREP is used and tested internally at JCVI. In April 2010 , we will release the beta version to a limited set of interested external users.

If you like to see the tool in action, join us at the DOE Genomic Science Workshop ( February 9-10, 2010) for our web and poster presentation (5:30 - 8:00 pm on each day) or sign up to become part of the beta testing process at www.jcvi.org/metarep .