Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology
- URL: http://arxiv.org/abs/2510.14532v1
- Date: Thu, 16 Oct 2025 10:24:23 GMT
- Title: Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology
- Authors: Xinrui Huang, Fan Xiao, Dongming He, Anqi Gao, Dandan Li, Xiaofan Zhang, Shaoting Zhang, Xudong Wang,
- Abstract summary: DentVFM is the first family of vision foundation models (VFMs) designed for dentistry.<n>It generates task-agnostic visual representations for a wide range of dental applications.<n>It shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks.
- Score: 22.124686092997717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Oral and maxillofacial radiology plays a vital role in dental healthcare, but radiographic image interpretation is limited by a shortage of trained professionals. While AI approaches have shown promise, existing dental AI systems are restricted by their single-modality focus, task-specific design, and reliance on costly labeled data, hindering their generalization across diverse clinical scenarios. To address these challenges, we introduce DentVFM, the first family of vision foundation models (VFMs) designed for dentistry. DentVFM generates task-agnostic visual representations for a wide range of dental applications and uses self-supervised learning on DentVista, a large curated dental imaging dataset with approximately 1.6 million multi-modal radiographic images from various medical centers. DentVFM includes 2D and 3D variants based on the Vision Transformer (ViT) architecture. To address gaps in dental intelligence assessment and benchmarks, we introduce DentBench, a comprehensive benchmark covering eight dental subspecialties, more diseases, imaging modalities, and a wide geographical distribution. DentVFM shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks, such as disease diagnosis, treatment analysis, biomarker identification, and anatomical landmark detection and segmentation. Experimental results indicate DentVFM significantly outperforms supervised, self-supervised, and weakly supervised baselines, offering superior generalization, label efficiency, and scalability. Additionally, DentVFM enables cross-modality diagnostics, providing more reliable results than experienced dentists in situations where conventional imaging is unavailable. DentVFM sets a new paradigm for dental AI, offering a scalable, adaptable, and label-efficient model to improve intelligent dental healthcare and address critical gaps in global oral healthcare.
Related papers
- OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis [2.194768059720059]
We present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline.<n>The framework operates on multi-view smartphone photographs,embeds diagnostics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM.
arXiv Detail & Related papers (2026-02-03T22:09:52Z) - DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry [28.389946455559713]
Current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details.<n>We present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning.
arXiv Detail & Related papers (2025-12-12T13:42:57Z) - OralGPT-Omni: A Versatile Dental Multimodal Large Language Model [44.919874082284686]
We present OralGPT- Omni, the first dental-specialized MLLM for comprehensive analysis across diverse dental imaging modalities and clinical tasks.<n>To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset.<n>In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis.
arXiv Detail & Related papers (2025-11-27T03:21:20Z) - An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z) - DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice [71.62725911420627]
We introduce DentVLM, a vision-language model engineered for expert-level oral disease diagnosis.<n>The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks.<n>It surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks.
arXiv Detail & Related papers (2025-09-27T14:47:37Z) - A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis [16.403842140593706]
We introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation.<n>We present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry.<n>We also propose OralGPT, which conducts supervised fine-tuning upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset.
arXiv Detail & Related papers (2025-09-11T08:39:08Z) - Adapting Foundation Model for Dental Caries Detection with Dual-View Co-Training [53.77904429789069]
We present Attention-TNet, a novel Dual-View Co-Training network for accurate dental caries detection.<n>OurTNet starts with employing automated tooth detection to establish two complementary views: a global view from panoramic X-ray images and a local view from cropped tooth images.<n>To effectively integrate information from both views, we introduce a Gated Cross-View module.
arXiv Detail & Related papers (2025-08-28T14:13:26Z) - Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.<n>Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.<n>We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z) - Improving Dental Diagnostics: Enhanced Convolution with Spatial Attention Mechanism [0.0]
This paper presents an enhanced ResNet50 architecture, integrated with the SimAM attention module, to address the challenge of limited contrast in dental images.
Our model demonstrates superior performance across various feature extraction techniques, achieving an F1 score of 0.676.
arXiv Detail & Related papers (2024-07-11T01:12:30Z) - Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation [6.041146512190833]
A semi-supervised learning framework is proposed to classify thirteen dental conditions on panoramic radiographs.
The solution demonstrated an accuracy level comparable to that of a junior specialist.
arXiv Detail & Related papers (2024-06-25T19:56:12Z) - Generative Adversarial Networks for Dental Patient Identity Protection
in Orthodontic Educational Imaging [0.0]
This research introduces a novel area-preserving Generative Adversarial Networks (GAN) inversion technique for effectively de-identifying dental patient images.
This innovative method addresses privacy concerns while preserving key dental features, thereby generating valuable resources for dental education and research.
arXiv Detail & Related papers (2023-07-05T04:14:57Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.