Who is using CD-HITCD-HIT has a very big user base, with tens of thousands of users, and it has been constantly growing. As a general, flexible and powerful clustering method, CD-HIT has been used in many areas. such as:
- Creating non-redundant datasets for proteins
- Clustering analysis of various types of DNAs and RNAs
- Clustering 16S rRNA tags into OTUs (454, Iontorrent and Illumina reads)
- Filtering artificially duplicated reads from 454 datasets
- Filtering artificially duplicated reads or PE reads from Illumina
- Generate protein families
- Merging assembled contigs from different assemblers
Li et al (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.
Li et al (2001) Clustering of highly homologous sequences to reduce the size of large protein database.
Li et al (2002) Tolerating some redundancy significantly speeds up clustering of large protein databases.
Huang et al (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences.
Niu et al (2009) Artificial and natural duplicates in pyrosequencing reads of metagenomic data
Indirect impactCD-HIT has also been indirectly serving more boarder communities by supporting many other resources that use CD-HIT, for example:
- Uniprot is the world's most comprehensive catalog of information on proteins. UniRef is one of the three components of UniProt, which provides clustered sets of sequences to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences (http://www.uniprot.org/help/uniref). The UniRef datasets has been created by CD-HIT.
- SWISS-MODEL (http://swissmodel.expasy.org/) is the most popular web server for protein structure homology modeling, where CD-HIT is used to prepare the reference database.
- FFAS (http://ffas.burnham.org/) is the most recognized sequence profile alignment server. CD-HIT is part of this server.
- CAMERA (http://camera.calit2.net) is the largest metagenomic sequence data and metadata repository with many annotation tools. CD-HIT is a key tool for CAMERA annotation pipelines.