CD-HIT Suite: Biological Sequence Clustering and Comparison
server not available now, please use UCSD server


Sequence file and databases.
server not available now, please use UCSD server

Sequence Identity Parameters
Number of CD-HIT runs.
Sequence identity cut-off for 1st run.
Sequence identity cut-off for 2nd run.
Sequence identity cut-off for 3rd run.

Algorithm Parameters
-r: comparing both strands.NoYes
-G: use global sequence identity.NoYes
-g: sequence is clustered to the best cluster that meet the threshold.NoYes
-b: bandwidth of alignment.

Alignment Coverage Parameters.
-aL: minimal alignment coverage (fraction) for the longer sequence
-AL: maximum unaligned part (amino acids/bases) for the longer sequence
-aS: minimal alignment coverage (fraction) for the shorter sequence
-AS: maximum unaligned part (amino acids/bases) for the shorter sequence
-s: minimal length similarity (fraction)
-S: maximum length difference in amino acids/bases(-S)

Mail address for job checking.
Give your mail address:
    


Reference:
  1. Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010(26): 680-682.full text
  2. Weizhong Li and Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006(22): 1658-1659. full text
  3. Weizhong Li, Lukasz Jaroszewski and Adam Godzik. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics, 2002(18): 77-82. full text
  4. Weizhong Li, Lukasz Jaroszewski and Adam Godzik. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 2001(17): 282-283. full text
Contact @Weizhong Li