Deep learning framework for RNA 5hmC prediction using RNA language model embeddings

Scritto il 03/02/2026
da Md Muhaiminul Islam Nafi

PLoS One. 2026 Feb 3;21(2):e0341649. doi: 10.1371/journal.pone.0341649. eCollection 2026.

ABSTRACT

By influencing gene expression and contributing to epigenetic modifications, Ribonucleic Acid (RNA) 5-Hydroxymethylcytosine (5hmC) modification significantly affects cellular pathways. It plays an important role in complex regulatory networks and gene expression. Moreover, 5hmC modifications are linked to a variety of human diseases, including diabetes, cancer, and cardiovascular conditions. However, experimental methods to identify RNA 5hmC modifications, such as chromatography and Polymerase Chain Reaction (PCR) amplification, are costly and time-consuming. So, computational methods are necessary to predict these modifications. In this study, several feature descriptors were analyzed and compared to finalize the best ones. Different deep-learning models were explored to design the proposed model architecture. Neighbourhood analysis was conducted on the dataset to provide insights into a deeper understanding of RNA 5hmC modifications. The proposed model, InTrans-RNA5hmC, is a dual-branch deep learning model that has two branches: the Inception branch and the Transformer branch. Word embeddings having the contextual information and language model embeddings from the RiboNucleic Acid Language Model (RiNALMo) were used as the finalized feature descriptors. InTrans-RNA5hmC outperformed existing SOTA methods, achieving 0.97 sensitivity, 0.985 balanced accuracy, and 0.985 F1 score on the Independent test set.

PMID:41632767 | DOI:10.1371/journal.pone.0341649