SOFTooth: Semantics-Enhanced Order-Aware Fusion for Tooth Instance Segmentation
- URL: http://arxiv.org/abs/2512.23411v1
- Date: Mon, 29 Dec 2025 12:14:41 GMT
- Title: SOFTooth: Semantics-Enhanced Order-Aware Fusion for Tooth Instance Segmentation
- Authors: Xiaolan Li, Wanquan Liu, Pengcheng Li, Pengyu Jie, Chenqiang Gao,
- Abstract summary: 3D tooth instance segmentation is challenging due to crowded arches, ambiguous tooth-gingiva boundaries, missing teeth, and rare yet clinically important third molars.<n>We propose SOFTooth, a semantics-enhanced 2D-3D fusion framework that leverages frozen 2D semantics without explicit 2D mask supervision.<n>On 3DTeethSeg'22, SOFTooth achieves state-of-theart overall accuracy and mean IoU, with clear gains on cases involving third molars.
- Score: 18.381890045783376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Three-dimensional (3D) tooth instance segmentation remains challenging due to crowded arches, ambiguous tooth-gingiva boundaries, missing teeth, and rare yet clinically important third molars. Native 3D methods relying on geometric cues often suffer from boundary leakage, center drift, and inconsistent tooth identities, especially for minority classes and complex anatomies. Meanwhile, 2D foundation models such as the Segment Anything Model (SAM) provide strong boundary-aware semantics, but directly applying them in 3D is impractical in clinical workflows. To address these issues, we propose SOFTooth, a semantics-enhanced, order-aware 2D-3D fusion framework that leverages frozen 2D semantics without explicit 2D mask supervision. First, a point-wise residual gating module injects occlusal-view SAM embeddings into 3D point features to refine tooth-gingiva and inter-tooth boundaries. Second, a center-guided mask refinement regularizes consistency between instance masks and geometric centroids, reducing center drift. Furthermore, an order-aware Hungarian matching strategy integrates anatomical tooth order and center distance into similarity-based assignment, ensuring coherent labeling even under missing or crowded dentitions. On 3DTeethSeg'22, SOFTooth achieves state-of-the-art overall accuracy and mean IoU, with clear gains on cases involving third molars, demonstrating that rich 2D semantics can be effectively transferred to 3D tooth instance segmentation without 2D fine-tuning.
Related papers
- DM-CFO: A Diffusion Model for Compositional 3D Tooth Generation with Collision-Free Optimization [20.638904379060573]
We propose an approach named DM-CFO for compositional tooth generation.<n>We show that our approach significantly improves the multiview consistency and realism of the generated teeth compared with existing methods.
arXiv Detail & Related papers (2026-03-04T00:25:09Z) - 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation [26.743010197720675]
3DTeethSAM is an adaptation of the Segment Anything Model 2 (SAM2) for 3D teeth segmentation.<n>Our method has been validated on the 3DTeethSeg benchmark, achieving an IoU of 91.90% on high-resolution 3D teeth meshes.
arXiv Detail & Related papers (2025-12-12T13:42:06Z) - SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing [20.383892902000976]
We propose splitting and growing reliable semantic masks for high-fidelity 3D instance segmentation (SGS-3D)<n>For semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives.<n>For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features.
arXiv Detail & Related papers (2025-09-05T14:37:31Z) - Integrating SAM Supervision for 3D Weakly Supervised Point Cloud Segmentation [66.65719382619538]
Current methods for 3D semantic segmentation propose training models with limited annotations to address the difficulty of annotating large, irregular, and unordered 3D point cloud data.<n>We present a novel approach that maximizes the utility of sparsely available 3D annotations incorporating segmentation masks generated by 2D foundation models.
arXiv Detail & Related papers (2025-08-27T14:13:01Z) - 3D Dental Model Segmentation with Geometrical Boundary Preserving [19.232921210620447]
3D intraoral scan mesh is widely used in digital dentistry diagnosis, segmenting 3D intraoral scan mesh is a critical preliminary task.<n>Deep learning-based methods are capable of the high accuracy segmentation of crown.<n>However, the segmentation accuracy at the junction between the crown and the gum is still below average.
arXiv Detail & Related papers (2025-03-31T04:00:11Z) - GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation [48.64133802117796]
GeoT is a framework that employs instance-dependent transition matrix (IDTM) to explicitly model noise in pseudo labels for semi-supervised dental segmentation.<n>Specifically, to handle the extensive solution space of IDTM arising from tens of thousands of dental points, we introduce tooth geometric priors.<n>Our method can make full utilization of unlabeled data to facilitate segmentation, achieving performance comparable to fully supervised methods with only $20%$ of the labeled data.
arXiv Detail & Related papers (2025-03-21T09:43:57Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation [72.12250272218792]
We propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D.
We integrate 3D global features as implicit conditions into the pre-trained 2D denoising UNet, enabling the generation of segmentation masks.
The generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings.
arXiv Detail & Related papers (2024-11-20T12:02:12Z) - Differentiable Collision-Supervised Tooth Arrangement Network with a Decoupling Perspective [10.293207903989053]
Existing learning-based methods use hidden teeth features to directly regress teeth motions.
We propose DTAN, a differentiable collision-supervised tooth arrangement network.
We construct three different tooth arrangement datasets and achieve drastically improved performance on accuracy and speed.
arXiv Detail & Related papers (2024-09-18T12:52:54Z) - 3D Structure-guided Network for Tooth Alignment in 2D Photograph [47.51314162367702]
A 2D photograph depicting aligned teeth prior to orthodontic treatment is crucial for effective dentist-patient communication.
We propose a 3D structure-guided tooth alignment network that takes 2D photographs as input and aligns the teeth within the 2D image space.
We evaluate our network on various facial photographs, demonstrating its exceptional performance and strong applicability within the orthodontic industry.
arXiv Detail & Related papers (2023-10-17T09:44:30Z) - 3D Tooth Mesh Segmentation with Simplified Mesh Cell Representation [42.512602472176184]
Manual tooth segmentation of 3D tooth meshes is tedious and there is variations among dentists.
We propose a novel segmentation method which utilizes only the barycenter and the normal at the barycenter information of the mesh cell.
We are the first to demonstrate that it is possible to relax the implicit structural constraint and yet achieve superior segmentation performance.
arXiv Detail & Related papers (2023-01-25T11:43:56Z) - Two-Stage Mesh Deep Learning for Automated Tooth Segmentation and
Landmark Localization on 3D Intraoral Scans [56.55092443401416]
emphiMeshSegNet in the first stage of TS-MDL reached an averaged Dice similarity coefficient (DSC) at 0.953pm0.076$, significantly outperforming the original MeshSegNet.
PointNet-Reg achieved a mean absolute error (MAE) of $0.623pm0.718, mm$ in distances between the prediction and ground truth for $44$ landmarks, which is superior compared with other networks for landmark detection.
arXiv Detail & Related papers (2021-09-24T13:00:26Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.