Abstract
Segment Anything Model (SAM)-based approaches have demonstrated remarkable potential for biomedical image segmentation. However, these methods often struggle to maintain spatial consistency in 3D electron microscopy (3D-EM) data and require extensive manual annotations. To this end, we propose Spatial-SAM, a spatially consistent and annotation-efficient framework that achieves high precision on 3D-EM data. Our method introduces two key innovations. First, we incorporate a 3D Signed Distance Field (SDF) memory mechanism that replaces the original memory in SAM2 with SDF representations precomputed by a 3D U-Net, providing richer geometric information and improving spatial consistency. Second, by combining the few-shot capability of SAM2 with a dual-track pseudo-label iterative optimization strategy, Spatial-SAM efficiently learns to segment large-scale 3D-EM datasets from minimal annotations. Experiments show that Spatial-SAM significantly outperforms existing semi-supervised methods and achieves performance comparable to state-of-the-art fully supervised approaches on multiple 3D-EM benchmarks, reducing annotation costs while preserving spatial consistency. The code will be publicly released upon acceptance.
Overview of the proposed Spatial-SAM framework. The upper panel presents the Spatial-SAM model, which extends SAM2 by integrating the SDF Memory mechanism for enhanced spatial representation. The lower panel depicts the proposed dual-track semi-supervised training scheme, which alternates between SDF training and mask training.
The dual-track training process of Spatial-SAM. The upper part illustrates the initialization process for generating pseudo labels, while the lower part shows the iterative training process, with SDF training on the left and mask training on the right.
3D visualization comparison of different methods on mitochondria with varying sizes and morphologies.
Visualization of segmentation results of the OpenOrganelle and MitoEM datasets. Cyan indicates true positives (TP), magenta indicates false negatives (FN), and yellow indicates false positives (FP).