DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
- URL: http://arxiv.org/abs/2509.23344v1
- Date: Sat, 27 Sep 2025 14:47:37 GMT
- Title: DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
- Authors: Zijie Meng, Jin Hao, Xiwei Dai, Yang Feng, Jiaxiang Liu, Bin Feng, Huikai Wu, Xiaotang Gai, Hengchuan Zhu, Tianxiang Hu, Yangyang Wu, Hongxia Xu, Jin Li, Jun Xiao, Xiaoqiang Liu, Joey Tianyi Zhou, Fudong Zhu, Zhihe Zhao, Lunguo Xia, Bing Fang, Jimeng Sun, Jian Wu, Zuozhu Liu,
- Abstract summary: We introduce DentVLM, a vision-language model engineered for expert-level oral disease diagnosis.<n>The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks.<n>It surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks.
- Score: 71.62725911420627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diagnosing and managing oral diseases necessitate advanced visual interpretation across diverse imaging modalities and integrated information synthesis. While current AI models excel at isolated tasks, they often fall short in addressing the complex, multimodal requirements of comprehensive clinical dental practice. Here we introduce DentVLM, a multimodal vision-language model engineered for expert-level oral disease diagnosis. DentVLM was developed using a comprehensive, large-scale, bilingual dataset of 110,447 images and 2.46 million visual question-answering (VQA) pairs. The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks, significantly outperforming leading proprietary and open-source models by 19.6% higher accuracy for oral diseases and 27.9% for malocclusions. In a clinical study involving 25 dentists, evaluating 1,946 patients and encompassing 3,105 QA pairs, DentVLM surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks. When integrated into a collaborative workflow, DentVLM elevated junior dentists' performance to senior levels and reduced diagnostic time for all practitioners by 15-22%. Furthermore, DentVLM exhibited promising performance across three practical utility scenarios, including home-based dental health management, hospital-based intelligent diagnosis and multi-agent collaborative interaction. These findings establish DentVLM as a robust clinical decision support tool, poised to enhance primary dental care, mitigate provider-patient imbalances, and democratize access to specialized medical expertise within the field of dentistry.
Related papers
- DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry [28.389946455559713]
Current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details.<n>We present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning.
arXiv Detail & Related papers (2025-12-12T13:42:57Z) - OralGPT-Omni: A Versatile Dental Multimodal Large Language Model [44.919874082284686]
We present OralGPT- Omni, the first dental-specialized MLLM for comprehensive analysis across diverse dental imaging modalities and clinical tasks.<n>To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset.<n>In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis.
arXiv Detail & Related papers (2025-11-27T03:21:20Z) - Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z) - Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology [22.124686092997717]
DentVFM is the first family of vision foundation models (VFMs) designed for dentistry.<n>It generates task-agnostic visual representations for a wide range of dental applications.<n>It shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks.
arXiv Detail & Related papers (2025-10-16T10:24:23Z) - Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis [16.403842140593706]
We introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation.<n>We present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry.<n>We also propose OralGPT, which conducts supervised fine-tuning upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset.
arXiv Detail & Related papers (2025-09-11T08:39:08Z) - DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding [18.678007079687706]
We introduce DentalBench, the first comprehensive benchmark designed to evaluate and advance large language models (LLMs) in the dental domain.<n> DentalBench consists of two main components: DentalQA, an English-Chinese question-answering (QA) benchmark with 36,597 questions spanning 4 tasks and 16 dental subfields; and DentalCorpus, a large-scale, high-quality corpus with 337.35 million tokens curated for dental domain adaptation.
arXiv Detail & Related papers (2025-08-28T04:35:51Z) - DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model [92.66916452260553]
DermNIO is a versatile foundation model for dermatology.<n>It incorporates a novel hybrid pretraining framework that augments the self-supervised learning paradigm.<n>It consistently outperforms state-of-the-art models across a wide range of tasks.
arXiv Detail & Related papers (2025-08-17T00:41:39Z) - An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [69.46279475491164]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z) - MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning [63.63542462400175]
We propose MMedAgent-RL, a reinforcement learning-based multi-agent framework that enables dynamic, optimized collaboration among medical agents.<n> Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists.<n>Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns.
arXiv Detail & Related papers (2025-05-31T13:22:55Z) - MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways [26.013336927642765]
Inpatient pathways demand complex clinical decision-making based on comprehensive patient information.<n>We propose the Multi-Agent Inpatient Pathways (MAP) framework to accomplish inpatient pathways with three clinical agents.<n>Extensive experiments showed our MAP improved the diagnosis accuracy by 25.10% compared to the state-of-the-art LLM HuatuoGPT2-13B.
arXiv Detail & Related papers (2025-03-17T14:14:28Z) - Specialized curricula for training vision-language models in retinal image analysis [8.167708226285932]
Vision-language models (VLMs) automatically interpret images and summarize their findings as text.<n>In this work, we demonstrate that OpenAI's ChatGPT-4o model markedly underperforms compared to practicing ophthalmologists on specialist tasks.
arXiv Detail & Related papers (2024-07-11T11:31:48Z) - Towards Accurate Differential Diagnosis with Large Language Models [37.48155380562073]
Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of differential diagnosis.
20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine.
Our study suggests that our LLM has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases.
arXiv Detail & Related papers (2023-11-30T19:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.