Department of Biological Sciences
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1922
Browse
134 results
Search Results
Item The multifaceted role of lncRNA MEG3 in kidney disease: a focus on mechanisms, therapeutic and diagnostic potential(Elsevier, 2025-09) Majumder, Syamantak; Gaikwad, Anil BhanudasKidney disease represents a global health challenge, affecting millions of people and contributing to significant morbidity and mortality. Long noncoding RNAs are potentially emerging as regulators in cellular processes and are involved in pathophysiological alterations in kidney disease. Among these, MEG3 has gained attention for its diverse regulatory roles in fibrosis, apoptosis and inflammation. MEG3 dysregulation has been implicated in conditions like chronic kidney disease, acute kidney injury, diabetic kidney disease and renal cell carcinoma. However, its involvement in endoplasmic reticulum (ER) stress and autophagy, crosstalk in kidney disease, is poorly understood. Hence, this review aims to highlight the role of MEG3 as a therapeutic and diagnostic viewpoint in kidney disease and its regulatory mechanism in ER stress and autophagy.Item Computational prediction of disordered binding regions(Elsevier, 2023) Basu, SushmitaOne of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.Item DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction(OUP, 2023-05) Basu, SushmitaIntrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictionsItem CoMemMoRFPred: sequence-based prediction of MemMoRFs by combining predictors of intrinsic disorder, MoRFs and disordered lipid-binding regions(Elsevier, 2023-11) Basu, SushmitaMolecular recognition features (MoRFs) are a commonly occurring type of intrinsically disordered regions (IDRs) that undergo disorder-to-order transition upon binding to partner molecules. We focus on recently characterized and functionally important membrane-binding MoRFs (MemMoRFs). Motivated by the lack of computational tools that predict MemMoRFs, we use a dataset of experimentally annotated MemMoRFs to conceptualize, design, evaluate and release an accurate sequence-based predictor. We rely on state-of-the-art tools that predict residues that possess key characteristics of MemMoRFs, such as intrinsic disorder, disorder-to-order transition and lipid-binding. We identify and combine results from three tools that include flDPnn for the disorder prediction, DisoLipPred for the prediction of disordered lipid-binding regions, and MoRFCHiBiLight for the prediction of disorder-to-order transitioning protein binding regions. Our empirical analysis demonstrates that combining results produced by these three methods generates accurate predictions of MemMoRFs. We also show that use of a smoothing operator produces predictions that closely mimic the number and sizes of the native MemMoRF regions.Item DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options(OUP, 2023-11) Basu, SushmitaThe DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeuticsItem HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins(OUP, 2023-12) Basu, SushmitaCurrent predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression.Item flDPnn2: accurate and fast predictor of intrinsic disorder in proteins(Elsevier, 2024-09) Basu, SushmitaPrediction of the intrinsic disorder in protein sequences is an active research area, with well over 100 predictors that were released to date. These efforts are motivated by the functional importance and high levels of abundance of intrinsic disorder, combined with relatively low amounts of experimental annotations. The disorder predictors are periodically evaluated by independent assessors in the Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiments. The recently completed CAID2 experiment assessed close to 40 state-of-the-art methods demonstrating that some of them produce accurate results. In particular, flDPnn2 method, which is the successor of flDPnn that performed well in the CAID1 experiment, secured the overall most accurate results on the Disorder-NOX dataset in CAID2. flDPnn2 implements a number of improvements when compared to its predecessor including changes to the inputs, increased size of the deep network model that we retrained on a larger training set, and addition of an alignment module. Using results from CAID2, we show that flDPnn2 produces accurate predictions very quickly, modestly improving over the accuracy of flDPnn and reducing the runtime by half, to about 27 s per proteinItem DescribePROT database of residue-level protein structure and function annotations(Springer, 2024-11) Basu, SushmitaDescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data.Item Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses(Elsevier, 2024-12) Basu, SushmitaIntrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.Item Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences(OUP, 2025-01) Basu, SushmitaComputational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.