Anal Biochem. 2026 Feb 26:116092. doi: 10.1016/j.ab.2026.116092. Online ahead of print.
ABSTRACT
The FAD binding sites classification problem is crucial because it is directly related to many diseases, such as flavoprotein-associated diseases, developmental disorders, digestive and lipid metabolism abnormalities, anemia, cancer, cardiovascular diseases, and neurological diseases. Researchers have conducted numerous studies to classify FAD binding sites, achieving varying levels of effectiveness. Nevertheless, significant potential for enhancement persists. Our study proposes msCNN-PLM-FAD, an innovative computational model predicting FAD binding sites in electron transport proteins using a sliding window feature extraction technique and pretrained protein language models (PLMs). Our approach combines a convolutional neural network-based window scanning technique able to capture various sequence features relevant to FAD binding sites with embeddings from the ESM PLM. Built on a dataset of 12,850 samples, the model outperformed previous studies, particularly in class-balance metrics, with an area under the curve (AUC) of 0.9614 and a Matthews correlation coefficient (MCC) of 0.7491, specificity of 0.9836, and accuracy of 0.9795, while maintaining a sensitivity of 0.8704. The results of this study demonstrate the potential to improve the accuracy of FAD binding site prediction, thereby promoting the development of drugs for diseases associated with FAD deficiency as well as therapeutic approaches that target bacterial FAD synthesis without affecting human health.
PMID:41763284 | DOI:10.1016/j.ab.2026.116092

