DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
- URL: http://arxiv.org/abs/2512.11558v1
- Date: Fri, 12 Dec 2025 13:42:57 GMT
- Title: DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
- Authors: Zhenyang Cai, Jiaming Zhang, Junjie Zhao, Ziyi Zeng, Yanchao Li, Jingyi Liang, Junying Chen, Yunjin Yang, Jiajun You, Shuzhi Deng, Tongfei Wang, Wanting Chen, Chunxiu Hao, Ruiqi Xie, Zhenwei Wen, Xiangyi Feng, Zou Ting, Jin Zou Lin, Jianquan Li, Guangjun Yu, Liangyi Chen, Junwen Wang, Shan Jiang, Benyou Wang,
- Abstract summary: Current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details.<n>We present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning.
- Score: 28.389946455559713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable interpretation of multimodal data in dentistry is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details and lack sufficient reasoning ability for precise diagnosis. To address these limitations, we present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning. Specifically, the largest annotated multimodal dataset for dentistry to date was constructed by aggregating over 120k dental images paired with detailed descriptions that highlight diagnostically relevant visual features, making it the multimodal dataset with the most extensive collection of dental images to date. Training on this dataset significantly enhances the MLLM's visual understanding of dental conditions, while the subsequent reinforcement learning stage further strengthens its capability for multimodal complex reasoning. Comprehensive evaluations on intraoral and panoramic benchmarks, along with dental subsets of medical VQA benchmarks, show that DentalGPT achieves superior performance in disease classification and dental VQA tasks, outperforming many state-of-the-art MLLMs despite having only 7B parameters. These results demonstrate that high-quality dental data combined with staged adaptation provides an effective pathway for building capable and domain-specialized dental MLLMs.
Related papers
- MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - OralGPT-Omni: A Versatile Dental Multimodal Large Language Model [44.919874082284686]
We present OralGPT- Omni, the first dental-specialized MLLM for comprehensive analysis across diverse dental imaging modalities and clinical tasks.<n>To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset.<n>In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis.
arXiv Detail & Related papers (2025-11-27T03:21:20Z) - Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks [54.00822479127598]
We introduce a medical vision-language task named Medical Diagnosis (MDS)<n>MDS aims to understand clinical queries for medical images and generate the corresponding segmentation masks as well as diagnostic results.<n>We propose Sim4Seg, a novel framework that improves the performance of diagnosis segmentation.
arXiv Detail & Related papers (2025-11-10T03:22:42Z) - Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology [22.124686092997717]
DentVFM is the first family of vision foundation models (VFMs) designed for dentistry.<n>It generates task-agnostic visual representations for a wide range of dental applications.<n>It shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks.
arXiv Detail & Related papers (2025-10-16T10:24:23Z) - DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice [71.62725911420627]
We introduce DentVLM, a vision-language model engineered for expert-level oral disease diagnosis.<n>The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks.<n>It surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks.
arXiv Detail & Related papers (2025-09-27T14:47:37Z) - Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis [16.403842140593706]
We introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation.<n>We present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry.<n>We also propose OralGPT, which conducts supervised fine-tuning upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset.
arXiv Detail & Related papers (2025-09-11T08:39:08Z) - DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding [18.678007079687706]
We introduce DentalBench, the first comprehensive benchmark designed to evaluate and advance large language models (LLMs) in the dental domain.<n> DentalBench consists of two main components: DentalQA, an English-Chinese question-answering (QA) benchmark with 36,597 questions spanning 4 tasks and 16 dental subfields; and DentalCorpus, a large-scale, high-quality corpus with 337.35 million tokens curated for dental domain adaptation.
arXiv Detail & Related papers (2025-08-28T04:35:51Z) - EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.<n>Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.<n>We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - ChatGPT for Shaping the Future of Dentistry: The Potential of
Multi-Modal Large Language Model [18.59603757924943]
ChatGPT is a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI.
This paper mainly discusses the future applications of Large Language Models (LLMs) in dentistry.
arXiv Detail & Related papers (2023-03-23T15:34:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.