Related papers: An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars

An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars

URL: http://arxiv.org/abs/2509.09911v1
Date: Fri, 12 Sep 2025 00:54:07 GMT
Title: An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
Authors: Barkin Buyukcakir, Jannick De Tobel, Patrick Thevissen, Dirk Vandermeulen, Peter Claes,
Abstract summary: This study introduces a framework designed to enhance both performance and transparency in high-stakes forensic applications.<n>We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study.
Score: 4.6984251688936425
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The practical adoption of deep learning in high-stakes forensic applications, such as dental age estimation, is often limited by the 'black box' nature of the models. This study introduces a framework designed to enhance both performance and transparency in this context. We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study. The proposed framework, which combines a convolutional autoencoder (AE) with a Vision Transformer (ViT), improves classification accuracy for both teeth over a baseline ViT, increasing from 0.712 to 0.815 for tooth 37 and from 0.462 to 0.543 for tooth 38. Beyond improving performance, the framework provides multi-faceted diagnostic insights. Analysis of the AE's latent space metrics and image reconstructions indicates that the remaining performance gap is data-centric, suggesting high intra-class morphological variability in the tooth 38 dataset is a primary limiting factor. This work highlights the insufficiency of relying on a single mode of interpretability, such as attention maps, which can appear anatomically plausible yet fail to identify underlying data issues. By offering a methodology that both enhances accuracy and provides evidence for why a model may be uncertain, this framework serves as a more robust tool to support expert decision-making in forensic age estimation.

Related papers

Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis [0.815557531820863]
Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology.<n>This study presents an Attention-Gated Recurrent Residual U-Net (R2U-Net) based Triplanar (2.5D) model for improved brain tumor segmentation.
arXiv Detail & Related papers (2026-02-14T07:48:58Z)
Advanced Deep Learning Techniques for Classifying Dental Conditions Using Panoramic X-Ray Images [0.0]
This study investigates deep learning methods for automated classification of dental conditions in panoramic X-ray images.<n>Three approaches were evaluated: a custom convolutional neural network (CNN), hybrid models combining CNN feature extraction with traditional classifiers, and fine-tuned pre-trained architectures.<n>Results show that hybrid models improve discrimination of morphologically similar conditions and provide efficient, reliable performance.
arXiv Detail & Related papers (2025-08-27T04:52:50Z)
Mitigating Biases in Surgical Operating Rooms with Geometry [40.5145973787288]
Deep neural networks are prone to learning spurious correlations, exploiting dataset-specific artifacts for prediction.<n>In surgical operating rooms (OR), these manifest through the standardization of smocks and gowns that obscure robust identifying landmarks.<n>We address this problem by encoding personnel as 3D point cloud sequences, disentangling identity-relevant shape and motion patterns from appearance-based confounders.
arXiv Detail & Related papers (2025-08-11T14:32:32Z)
HANS-Net: Hyperbolic Convolution and Adaptive Temporal Attention for Accurate and Generalizable Liver and Tumor Segmentation in CT Imaging [1.3149714289117207]
Accurate liver and tumor segmentation on abdominal CT images is critical for reliable diagnosis and treatment planning.<n>We introduce Hyperbolic-convolutions Adaptive-temporal-attention with Neural-representation and Synaptic-plasticity Network (HANS-Net)<n>HANS-Net combines hyperbolic convolutions for hierarchical geometric representation, a wavelet-inspired decomposition module for multi-scale texture learning, and an implicit neural representation branch.
arXiv Detail & Related papers (2025-07-15T13:56:37Z)
APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs [42.58128666405841]
The Asia Pacific Tele-Ophthalmology Society organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images.<n>This paper details the challenge framework (referred to as APTOS-2024 Challenge), including the benchmark dataset.<n>The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists.
arXiv Detail & Related papers (2025-06-09T08:29:37Z)
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging [41.446379453352534]
Latent Diffusion Autoencoder (LDAE) is a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging.<n>This study focuses on Alzheimer disease (AD) using brain MR from the ADNI database as a case study.
arXiv Detail & Related papers (2025-04-11T15:37:46Z)
ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models [46.80682547774335]
We propose ScaleMAI, an agent of AI-integrated data curation and annotation.<n>First, ScaleMAI creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures.<n>Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators in detecting pancreatic tumors.
arXiv Detail & Related papers (2025-01-06T22:12:00Z)
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z)
Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging [3.093890460224435]
We present a novel deep learning framework that leverages the diagnostic value of 3D Optical Coherence Tomography ( OCT) imaging for automated glaucoma detection.<n>We integrate a pre-trained Vision Transformer on retinal data for rich slice-wise feature extraction and a bidirectional Gated Recurrent Unit for capturing inter-slice spatial dependencies.
arXiv Detail & Related papers (2024-03-08T22:25:15Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.