OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
- URL: http://arxiv.org/abs/2602.07041v1
- Date: Tue, 03 Feb 2026 22:09:52 GMT
- Title: OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis
- Authors: Leeje Jang, Yao-Yi Chiang, Angela M. Hastings, Patimaporn Pungchanchaikul, Martha B. Lucas, Emily C. Schultz, Jeffrey P. Louie, Mohamed Estai, Wen-Chen Wang, Ryan H. L. Ip, Boyen Huang,
- Abstract summary: We present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline.<n>The framework operates on multi-view smartphone photographs,embeds diagnostics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM.
- Score: 2.194768059720059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visual pattern recognition task and do not reflect the structured clinical reasoning used by dental professionals. These approaches also require large amounts of expert-annotated data and often struggle to generalize across diverse real-world imaging conditions. To address these limitations, we present OMNI-Dent, a data-efficient and explainable diagnostic framework that incorporates clinical reasoning principles into a Vision-Language Model (VLM)-based pipeline. The framework operates on multi-view smartphone photographs,embeds diagnostic heuristics from dental experts, and guides a general-purpose VLM to perform tooth-level evaluation without dental-specific fine-tuning of the VLM. By utilizing the VLM's existing visual-linguistic capabilities, OMNI-Dent aims to support diagnostic assessment in settings where curated clinical imaging is unavailable. Designed as an early-stage assistive tool, OMNI-Dent helps users identify potential abnormalities and determine when professional evaluation may be needed, offering a practical option for individuals with limited access to in-person care.
Related papers
- Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z) - Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology [22.124686092997717]
DentVFM is the first family of vision foundation models (VFMs) designed for dentistry.<n>It generates task-agnostic visual representations for a wide range of dental applications.<n>It shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks.
arXiv Detail & Related papers (2025-10-16T10:24:23Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning [37.37330596550283]
We introduce a framework for reliable medical image diagnosis using vision-language models.<n>A test-time scaling strategy consolidates multiple candidate outputs into a reliable final diagnosis.<n>We evaluate our approach across various medical imaging modalities.
arXiv Detail & Related papers (2025-06-11T22:23:38Z) - EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis [62.00431604976949]
EndoBench is the first comprehensive benchmark specifically designed to assess MLLMs across the full spectrum of endoscopic practice.<n>We benchmark 23 state-of-the-art models, including general-purpose, medical-specialized, and proprietary MLLMs.<n>Our experiments reveal: proprietary MLLMs outperform open-source and medical-specialized models overall, but still trail human experts.
arXiv Detail & Related papers (2025-05-29T16:14:34Z) - MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow [14.478357882578234]
In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data.<n>Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis.<n>We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine.
arXiv Detail & Related papers (2025-03-21T14:04:18Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation
for Automatic Diagnosis [30.943705201552643]
We propose a framework to model the diagnosis process in the real world by adaptively fusing probability distributions of agents over potential diseases.
Our approach requires significantly less parameter updating and training time, enhancing efficiency and practical utility.
arXiv Detail & Related papers (2024-01-29T12:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.