Towards the Formalization of a Trustworthy AI for Mining Interpretable Models explOiting Sophisticated Algorithms
- URL: http://arxiv.org/abs/2510.20621v2
- Date: Thu, 30 Oct 2025 16:26:39 GMT
- Title: Towards the Formalization of a Trustworthy AI for Mining Interpretable Models explOiting Sophisticated Algorithms
- Authors: Riccardo Guidotti, Martina Cinquini, Marta Marchiori Manerba, Mattia Setzu, Francesco Spinnato,
- Abstract summary: Interpretable-by-design models are crucial for fostering trust, accountability, and safe adoption of automated decision-making models in real-world applications.<n>We formalize a comprehensive methodology for generating predictive models that balance interpretability with performance.<n>By evaluating ethical measures during model generation, this framework establishes the theoretical foundations for developing AI systems.
- Score: 4.587316936127635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretable-by-design models are crucial for fostering trust, accountability, and safe adoption of automated decision-making models in real-world applications. In this paper we formalize the ground for the MIMOSA (Mining Interpretable Models explOiting Sophisticated Algorithms) framework, a comprehensive methodology for generating predictive models that balance interpretability with performance while embedding key ethical properties. We formally define here the supervised learning setting across diverse decision-making tasks and data types, including tabular data, time series, images, text, transactions, and trajectories. We characterize three major families of interpretable models: feature importance, rule, and instance based models. For each family, we analyze their interpretability dimensions, reasoning mechanisms, and complexity. Beyond interpretability, we formalize three critical ethical properties, namely causality, fairness, and privacy, providing formal definitions, evaluation metrics, and verification procedures for each. We then examine the inherent trade-offs between these properties and discuss how privacy requirements, fairness constraints, and causal reasoning can be embedded within interpretable pipelines. By evaluating ethical measures during model generation, this framework establishes the theoretical foundations for developing AI systems that are not only accurate and interpretable but also fair, privacy-preserving, and causally aware, i.e., trustworthy.
Related papers
- Actionable Interpretability Must Be Defined in Terms of Symmetries [37.964025348175504]
This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions fail to describe how interpretability can be formally tested or designed for.<n>We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions.
arXiv Detail & Related papers (2026-01-19T10:10:17Z) - Explaining Machine Learning Predictive Models through Conditional Expectation Methods [0.0]
MUCE is a model-agnostic method for local explainability designed to capture prediction changes from feature interactions.<n>Two quantitative indices, stability and uncertainty, summarize local behavior and assess model reliability.<n>Results show that MUCE effectively captures complex local model behavior, while the stability and uncertainty indices provide meaningful insight into prediction confidence.
arXiv Detail & Related papers (2026-01-12T08:34:36Z) - A Comprehensive Survey on World Models for Embodied AI [14.457261562275121]
Embodied AI requires agents that perceive, act, and anticipate how actions reshape future world states.<n>This survey presents a unified framework for world models in embodied AI.
arXiv Detail & Related papers (2025-10-19T07:12:32Z) - Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models [93.1043186636177]
We explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations.<n>We propose a computational implementation of this idea -- a Model Synthesis Architecture''<n>We evaluate our MSA as a model of human judgments on a novel reasoning dataset.
arXiv Detail & Related papers (2025-07-16T18:01:03Z) - Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping [7.299890614172539]
This study addresses key challenges in machine learning, namely the absence of a unified formal theoretical framework and the lack of foundational theories for model interpretability and ethical safety.<n>We first construct a formal information model, explicitly defining the ontological states and carrier mappings of typical machine learning stages.<n>By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the causal chain logic and constraint laws governing machine learning processes.
arXiv Detail & Related papers (2025-05-19T14:39:41Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
InterpretCC is a family of intrinsically interpretable neural networks that optimize for ease of human understanding and explanation faithfulness.<n>We show that InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics [0.0]
We develop six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained.
These metrics measure different aspects of model explainability, ranging from local importance, global importance, and surrogate predictions.
We demonstrate the practical utility of these metrics on classification and regression tasks, and integrate these metrics into an existing Python package for public use.
arXiv Detail & Related papers (2023-02-23T15:28:36Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - On the Opportunities and Risks of Foundation Models [256.61956234436553]
We call these models foundation models to underscore their critically central yet incomplete character.
This report provides a thorough account of the opportunities and risks of foundation models.
To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration.
arXiv Detail & Related papers (2021-08-16T17:50:08Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Uncertainty as a Form of Transparency: Measuring, Communicating, and
Using Uncertainty [66.17147341354577]
We argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions.
We describe how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems.
This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness.
arXiv Detail & Related papers (2020-11-15T17:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.