CD-HIT has a very large user base and we frequently receive users requests, comments, suggestions and feedback, which have been a driving force for CD-HIT project. We would like to build a more interactive and engaged CD-HIT user community. We invite all users, developer and researcher to play roles in cd-hit development:
What you can do:
- Contact us for bugs, questions
- Request new feature and function
- Co-develop usecase with our team
- Co-develop proposal using cd-hit as a key software
Use casesHere a use case is a study that uses or customizes our clustering tools in a non-standard, non-trivial or smart way to solve a specific biological sequence analysis problem, e.g. clustering HIV sequences from clinical samples for classification of viral types. A use case often requires our team to modify CD-HIT codes, develop scripts and to work with users. Contact us if you are interested in co-develop use cases with us. We are currently working on documentation for existing usecases, these will be releases soon.
List of functions and features to be developedHere is a list of major new functions and features suggested by cd-hit users and friends:
- Clustering very long sequences - we are developing new software that can cluster genome-sized DNA sequences.
- Option for creating multiple sequence alignment for each cluster.
- Restart file - program writes restart file at certain time interval. In case of a crash, the program can read in the restart file and continue instead of starting over.
- Option of using sequence similarity% besides sequence identity%
- Option of specify gap penalty, match/mismatch scores etc
- Update psi-cd-hit for DNA clustering