JCVI: Research / Projects / Cloud BioLinux / Overview
 
 
Section Banner

Cloud BioLinux

Overview

A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure.

Motivation

An alternative model is now available: computational capacity can be purchased as a service from a cloud computing provider, and specialized computational systems can be run on such platforms. Cloud infrastructures provide researchers with the ability to perform computations using a practically unlimited pool of Virtual Machines (VMs), without the burden of owning or maintaining hardware. Cloud computing services use a charge model similar to utilities such as electricity, and thus customers are billed based on amounts of computing resources consumed. Along these lines, the Cloud BioLinux project offers an on-demand, cloud computing solution for the bioinformatics community, and is available for use on the commercially hosted Amazon EC2 cloud computing infrastructure, while it was developed on and fully supports the Eucalyptus cloud platform. For small laboratories without access to large computational resources, running Cloud BioLinux through a commercial cloud platform provides a cost-effective route from data to knowledge, while those with access to private clouds will still benefit from the abundance of pre-configured software and the user-friendly desktop interface available.

Vision

Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.