Abstract:
Dozens of impactful methods that predict intrinsically disordered regions (IDRs) in protein sequences that interact with proteins and/or nucleic acids were developed. Their training and assessment rely on the IDR-level binding annotations, while the equivalent structure-trained methods predict more granular annotations of binding amino acids (AA). We compiled a new benchmark dataset that annotates binding AA in IDRs and applied it to complete a first-of-its-kind assessment of predictions of the disordered binding residues. We evaluated a representative collection of 14 methods, used several hundred low-similarity test proteins, and focused on the challenging task of differentiating these binding residues from other disordered AA and considering ligand type-specific predictions (protein–protein vs. protein–nucleic acid interactions). We found that current methods struggle to accurately predict binding IDRs among disordered residues; however, better-than-random tools predict disordered binding residues significantly better than binding IDRs. We identified at least one relatively accurate tool for predicting disordered protein-binding and disordered nucleic acid-binding AA. Analysis of cross-predictions between interactions with protein and nucleic acids revealed that most methods are ligand-type-agnostic. Only two predictors of the nucleic acid-binding IDRs and two predictors of the protein-binding IDRs can be considered as ligand-type-specific. We also discussed several potential future directions that would move this field forward by producing more accurate methods that target the prediction of binding residues, reduce cross-predictions, and cover a broader range of ligand types.