IEEE Trans Med Imaging. 2026 May 27;PP. doi: 10.1109/TMI.2026.3697520. Online ahead of print.
ABSTRACT
Automatic echocardiography video segmentation is crucial for accurate diagnosis of cardiovascular diseases, as high-quality segmentation significantly improves automated lesion detection. However, deep learning methods still face challenges including speckle noise, dynamic ventricular changes, limited annotations, and the requirement for real-time inference in clinical practice. In this paper, we propose MSSNet, a novel semi-supervised method based on the efficient sequence modeling architecture Mamba, to address these challenges. To enhance noise robustness and foreground tracking, we design a flexible and efficient spatiotemporal synergistic guidance (SSG) module that leverages attention weights from historical frames to guide subsequent segmentation. By incorporating stable structural context and modeling inter-frame dependencies through weight propagation, SSG effectively mitigates segmentation errors caused by strong local noise and ventricular dynamics while maintaining low computational complexity. To alleviate limited annotation, we further introduce two semi-supervised modules: region-wise adaptive cross-mix (RAC) and dynamic offset correction (DOC). RAC simulates clinically plausible samples via regional mixing to enrich semantic details and strengthen feature learning, while DOC continuously integrates features from high-quality pseudo-labels during training. Experiments on the CAMUS and EchoNet-Dynamic datasets demonstrate that MSSNet outperforms existing SOTA methods in segmentation accuracy and achieves notable improvements in inference speed. The code is available at https://github.com/SSS666-klk/MSSNet.
PMID:42202178 | DOI:10.1109/TMI.2026.3697520

