PLoS One. 2026 Mar 6;21(3):e0344084. doi: 10.1371/journal.pone.0344084. eCollection 2026.
ABSTRACT
BACKGROUND AND AIMS: Pulmonary arterial hypertension (PAH) is a severe disease with limited effective therapies, making the discovery of new therapeutic targets crucial. While single-cell RNA sequencing (sc-RNA seq) offers a powerful tool for this purpose, its application is hampered by the scarcity of patient samples. This study addresses the problem of how to efficiently identify novel, functionally relevant disease-associated genes from limited publicly available data.
METHODS: We employed transfer learning by fine-tuning Geneformer, a deep learning model, with public sc-RNA seq data from patients with PAH to create a specialized model called PAH-former. This model was used to perform in silico perturbation analysis to identify and rank candidate genes predicted to influence the disease state. For validation, we performed RNA interference-mediated knockdown of top novel candidate genes in human pulmonary artery endothelial cells and measured the expression of SRY-Box Transcription Factor 18 (SOX18), a signature gene of pulmonary arterial hypertension.
RESULTS: In silico perturbation analysis identified 134 candidate genes whose deletion was predicted to shift cells towards a disease phenotype. These included known disease-related genes as well as many novel ones. Subsequent in vitro validation demonstrated that knockdown of the candidate genes resulted in a significant increase in the expression of SOX18.
CONCLUSIONS: Our novel platform, PAH-former, provides a powerful and broadly applicable strategy for disease-related gene discovery. This approach enables the identification and validation of new candidate genes from limited data, promising to advance cell-specific mechanistic insights and accelerate therapeutic development for rare diseases like PAH. (248/300 words).
PMID:41790620 | DOI:10.1371/journal.pone.0344084

