Related papers: FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

URL: http://arxiv.org/abs/2511.15481v2
Date: Thu, 20 Nov 2025 14:00:08 GMT
Title: FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI
Authors: Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz,
Abstract summary: We introduce FunnyNodules, a fully parameterized synthetic dataset for analysis of attribute-based reasoning in medical AI models.<n>The dataset generates lung nodule-like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation.<n>We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute-target relations.
Score: 0.5249805590164902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract, lung nodule-like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. Target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute-target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.

Related papers

MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z)
Interpretable Clinical Classification with Kolgomorov-Arnold Networks [70.72819760172744]
Kolmogorov-Arnold Networks (KANs) offer intrinsic interpretability through transparent, symbolic representations.<n>KANs support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval.<n>These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon.
arXiv Detail & Related papers (2025-09-20T17:21:58Z)
Minimum Data, Maximum Impact: 20 annotated samples for explainable lung nodule classification [0.0]
Radiologists use attributes like shape and texture as established diagnostic criteria and mirroring these in AI decision-making.<n>The adoption of such models is limited by the scarcity of large-scale medical image datasets annotated with these attributes.<n>This work highlights the potential of synthetic data to overcome dataset limitations, enhancing the applicability of explainable models in medical image analysis.
arXiv Detail & Related papers (2025-08-01T13:54:34Z)
Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework [8.017827642932746]
Generalized Attribute Utility and Detectability-Induced bias Testing (G-AUDIT) for datasets is a modality-agnostic dataset auditing framework.<n>Our method examines the relationship between task-level annotations and data properties including patient attributes.<n>G-AUDIT successfully identifies subtle biases commonly overlooked by traditional qualitative methods.
arXiv Detail & Related papers (2025-03-13T02:16:48Z)
Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model [0.0]
We evaluate attribute- and prototype-based explanations with the Proto-Caps model. We can conclude that attribute scores and visual prototypes enhance confidence in the model.
arXiv Detail & Related papers (2024-04-15T16:43:24Z)
Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state. We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability. Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z)
Learn-Explain-Reinforce: Counterfactual Reasoning and Its Guidance to Reinforce an Alzheimer's Disease Diagnosis Model [1.6287500717172143]
We propose a novel framework that unifies diagnostic model learning, visual explanation generation, and trained diagnostic model reinforcement. For the visual explanation, we generate a counterfactual map that transforms an input sample to be identified as a target label.
arXiv Detail & Related papers (2021-08-21T07:29:13Z)
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features [47.45835732009979]
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. Features attribution methods identify the importance of input features for the output prediction. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-independent feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.
arXiv Detail & Related papers (2021-04-01T11:42:39Z)
Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning. It aims to extract both the common information and the complementary information in an adversarial setting. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
Statistical Exploration of Relationships Between Routine and Agnostic Features Towards Interpretable Risk Characterization [0.0]
How do we interpret the prognostic model for clinical implementation? How can we identify potential information structures within sets of radiomic features? And how can we recombine or exploit potential relationships between features towards improved interpretability?
arXiv Detail & Related papers (2020-01-28T14:27:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.