Related papers: Generalizability vs. Counterfactual Explainability Trade-Off

Generalizability vs. Counterfactual Explainability Trade-Off

URL: http://arxiv.org/abs/2505.23225v1
Date: Thu, 29 May 2025 08:17:59 GMT
Title: Generalizability vs. Counterfactual Explainability Trade-Off
Authors: Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei,
Abstract summary: We introduce the notion of $varepsilon$-valid counterfactual probability ($varepsilon$-VCP)<n>We show that $varepsilon$-VCP tends to increase with model overfitting.
Score: 6.3107782051840555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of $\varepsilon$-valid counterfactual probability ($\varepsilon$-VCP) -- the probability of finding perturbations of a data point within its $\varepsilon$-neighborhood that result in a label change. We provide a theoretical analysis of $\varepsilon$-VCP in relation to the geometry of the model's decision boundary, showing that $\varepsilon$-VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting $\varepsilon$-VCP as a practical proxy for quantitatively characterizing overfitting.

Related papers

Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness [63.833913892018536]
We study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$.<n>We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness.<n>We show that the generalized smoothness also characterizes private learnability under distributional constraints.
arXiv Detail & Related papers (2026-02-24T06:15:59Z)
A Refinement of Vapnik--Chervonenkis' Theorem [0.0]
Vapnik--Chervonenkis' theorem is a seminal result in machine learning.<n>We revisit the probabilistic component of the classical argument.
arXiv Detail & Related papers (2026-01-23T02:57:29Z)
Likelihood-Preserving Embeddings for Statistical Inference [0.0]
Modern machine learning embeddings provide powerful compression of high-dimensional data.<n>This paper develops a theory of likelihood-preserving embeddings.<n>Experiments on Gaussian and Cauchy distributions validate the sharp phase transition predicted by exponential family theory.
arXiv Detail & Related papers (2025-12-27T16:21:55Z)
A Foundational Theory of Quantitative Abstraction: Adjunctions, Duality, and Logic for Probabilistic Systems [2.362412515574206]
Large or continuous state spaces make exact analysis intractable and call for principled quantitative abstraction.<n>This work develops a unified theory of such abstraction by integrating category theory, coalgebra, quantitative logic, and optimal transport.
arXiv Detail & Related papers (2025-10-22T10:16:24Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Suboptimal Shapley Value Explanations [3.0872915940839274]
Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications.<n> Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of DNNs.<n>We propose a simple uncertainty-based reweighting mechanism to accelerate the computation process.
arXiv Detail & Related papers (2025-02-17T01:17:12Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Interaction Asymmetry: A General Principle for Learning Composable Abstractions [27.749478197803256]
We show that interaction asymmetry enables both disentanglement and compositional generalization. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder.
arXiv Detail & Related papers (2024-11-12T13:33:26Z)
Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference [9.940560505044122]
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference. We estimate the marginal likelihood based on approximate representations of the joint model.
arXiv Detail & Related papers (2023-10-06T17:41:41Z)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z)
A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown. We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z)
CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z)
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model. Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z)
Outlier-Robust Optimal Transport: Duality, Structure, and Statistical Applications [25.410110072480187]
Wasserstein distances are sensitive to outliers in the considered distributions. We propose a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from each contaminated distribution.
arXiv Detail & Related papers (2021-11-02T04:05:45Z)
Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective [26.704446184314506]
We study the problem of measuring the fairness of a machine learning model under noisy information. We present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible.
arXiv Detail & Related papers (2021-05-20T18:36:28Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data. Under a class of statistical models, we provide an exact analysis of the universality error of boosting. We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.