Viral Ortholog Clustering
Overview
Ortholog Clustering
A modification to the OrthoMCL algorithm improves the performance of ortholog clustering for viral proteins.
Polyproteins and mature peptides may be clustered in the same group
BLAST, a local alignment method, generates incorrect ortholog groups in some cases, in particular when a long polyprotein overlaps with many mature peptides as shown in the illustration at the top.
Unequal Lengths
The additional length does not decrease the BLAST score, so the mature peptides form ortholog groups with the polyproteins and therefore with each other.
We have modified the OrthoMCL to determine the subject and query gene similarity by taking the ratio of the gene lengths into account.
Evaluating Results
The modification fixes the problem illustrated. To demonstrate this, we have developed two new metrics that utilize external information, in particular the virus strain and the gene name, to quantitatively measure the improvement in the performance of the clustering algorithms.
Links
Funding
National Institute of Allergy and Infectious Disease (NIAID)