An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
- URL: http://arxiv.org/abs/2509.09911v1
- Date: Fri, 12 Sep 2025 00:54:07 GMT
- Title: An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
- Authors: Barkin Buyukcakir, Jannick De Tobel, Patrick Thevissen, Dirk Vandermeulen, Peter Claes,
- Abstract summary: This study introduces a framework designed to enhance both performance and transparency in high-stakes forensic applications.<n>We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study.
- Score: 4.6984251688936425
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The practical adoption of deep learning in high-stakes forensic applications, such as dental age estimation, is often limited by the 'black box' nature of the models. This study introduces a framework designed to enhance both performance and transparency in this context. We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study. The proposed framework, which combines a convolutional autoencoder (AE) with a Vision Transformer (ViT), improves classification accuracy for both teeth over a baseline ViT, increasing from 0.712 to 0.815 for tooth 37 and from 0.462 to 0.543 for tooth 38. Beyond improving performance, the framework provides multi-faceted diagnostic insights. Analysis of the AE's latent space metrics and image reconstructions indicates that the remaining performance gap is data-centric, suggesting high intra-class morphological variability in the tooth 38 dataset is a primary limiting factor. This work highlights the insufficiency of relying on a single mode of interpretability, such as attention maps, which can appear anatomically plausible yet fail to identify underlying data issues. By offering a methodology that both enhances accuracy and provides evidence for why a model may be uncertain, this framework serves as a more robust tool to support expert decision-making in forensic age estimation.
Related papers
- Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis [0.815557531820863]
Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology.<n>This study presents an Attention-Gated Recurrent Residual U-Net (R2U-Net) based Triplanar (2.5D) model for improved brain tumor segmentation.
arXiv Detail & Related papers (2026-02-14T07:48:58Z) - Advanced Deep Learning Techniques for Classifying Dental Conditions Using Panoramic X-Ray Images [0.0]
This study investigates deep learning methods for automated classification of dental conditions in panoramic X-ray images.<n>Three approaches were evaluated: a custom convolutional neural network (CNN), hybrid models combining CNN feature extraction with traditional classifiers, and fine-tuned pre-trained architectures.<n>Results show that hybrid models improve discrimination of morphologically similar conditions and provide efficient, reliable performance.
arXiv Detail & Related papers (2025-08-27T04:52:50Z) - Mitigating Biases in Surgical Operating Rooms with Geometry [40.5145973787288]
Deep neural networks are prone to learning spurious correlations, exploiting dataset-specific artifacts for prediction.<n>In surgical operating rooms (OR), these manifest through the standardization of smocks and gowns that obscure robust identifying landmarks.<n>We address this problem by encoding personnel as 3D point cloud sequences, disentangling identity-relevant shape and motion patterns from appearance-based confounders.
arXiv Detail & Related papers (2025-08-11T14:32:32Z) - HANS-Net: Hyperbolic Convolution and Adaptive Temporal Attention for Accurate and Generalizable Liver and Tumor Segmentation in CT Imaging [1.3149714289117207]
Accurate liver and tumor segmentation on abdominal CT images is critical for reliable diagnosis and treatment planning.<n>We introduce Hyperbolic-convolutions Adaptive-temporal-attention with Neural-representation and Synaptic-plasticity Network (HANS-Net)<n>HANS-Net combines hyperbolic convolutions for hierarchical geometric representation, a wavelet-inspired decomposition module for multi-scale texture learning, and an implicit neural representation branch.
arXiv Detail & Related papers (2025-07-15T13:56:37Z) - APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs [42.58128666405841]
The Asia Pacific Tele-Ophthalmology Society organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images.<n>This paper details the challenge framework (referred to as APTOS-2024 Challenge), including the benchmark dataset.<n>The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists.
arXiv Detail & Related papers (2025-06-09T08:29:37Z) - Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging [41.446379453352534]
Latent Diffusion Autoencoder (LDAE) is a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging.<n>This study focuses on Alzheimer disease (AD) using brain MR from the ADNI database as a case study.
arXiv Detail & Related papers (2025-04-11T15:37:46Z) - ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models [46.80682547774335]
We propose ScaleMAI, an agent of AI-integrated data curation and annotation.<n>First, ScaleMAI creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures.<n>Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators in detecting pancreatic tumors.
arXiv Detail & Related papers (2025-01-06T22:12:00Z) - Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease.
MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z) - Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging [3.093890460224435]
We present a novel deep learning framework that leverages the diagnostic value of 3D Optical Coherence Tomography ( OCT) imaging for automated glaucoma detection.<n>We integrate a pre-trained Vision Transformer on retinal data for rich slice-wise feature extraction and a bidirectional Gated Recurrent Unit for capturing inter-slice spatial dependencies.
arXiv Detail & Related papers (2024-03-08T22:25:15Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.