Embeddings of clinical codes enable knowledge-grounded AI in medicine

Scritto il 11/06/2026
da Ruth Johnson

NPJ Digit Med. 2026 Jun 11. doi: 10.1038/s41746-026-02664-9. Online ahead of print.

ABSTRACT

Standardization of electronic health records (EHRs) has enabled the use of clinical codes in AI. We introduce ClinVec, an embedding store that provides embeddings for 153,166 clinical codes and concepts across eight vocabularies. ClinVec embeds ClinGraph, a knowledge graph with over 2 million edges tailored to clinical vocabularies used in EHRs. We validate the embeddings using an inter-institutional clinician panel and N = 3767 clinical term pairs spanning 11 disease areas, and we find that embedding similarity reflects clinical relatedness. We use ClinVec for knowledge injection in large language model medical question answering and for unsupervised patient stratification and risk prediction. By providing a shared representation of clinical concepts, ClinVec supports knowledge-grounded AI systems for modeling patients and populations.

PMID:42277300 | DOI:10.1038/s41746-026-02664-9