A hierarchical decomposition for explaining ML performance discrepancies
- URL: http://arxiv.org/abs/2402.14254v1
- Date: Thu, 22 Feb 2024 03:41:05 GMT
- Title: A hierarchical decomposition for explaining ML performance discrepancies
- Authors: Jean Feng, Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej
Gossmann
- Abstract summary: Machine learning algorithms can often differ in performance across domains. Understanding $textitwhy$ their performance differs is crucial for determining what types of interventions are most effective at closing the performance gaps.
We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains without requiring causal knowledge.
- Score: 6.603088808962966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) algorithms can often differ in performance across
domains. Understanding $\textit{why}$ their performance differs is crucial for
determining what types of interventions (e.g., algorithmic or operational) are
most effective at closing the performance gaps. Existing methods focus on
$\textit{aggregate decompositions}$ of the total performance gap into the
impact of a shift in the distribution of features $p(X)$ versus the impact of a
shift in the conditional distribution of the outcome $p(Y|X)$; however, such
coarse explanations offer only a few options for how one can close the
performance gap. $\textit{Detailed variable-level decompositions}$ that
quantify the importance of each variable to each term in the aggregate
decomposition can provide a much deeper understanding and suggest much more
targeted interventions. However, existing methods assume knowledge of the full
causal graph or make strong parametric assumptions. We introduce a
nonparametric hierarchical framework that provides both aggregate and detailed
decompositions for explaining why the performance of an ML algorithm differs
across domains, without requiring causal knowledge. We derive debiased,
computationally-efficient estimators, and statistical inference procedures for
asymptotically valid confidence intervals.
Related papers
- S-CFE: Simple Counterfactual Explanations [21.975560789792073]
We tackle the problem of finding manifold-aligned counterfactual explanations for sparse data.
Our approach effectively produces sparse, manifold-aligned counterfactual explanations.
arXiv Detail & Related papers (2024-10-21T07:42:43Z) - Efficient Differentiable Discovery of Causal Order [14.980926991441342]
Intersort is a score-based method to discover causal order of variables.
We reformulate Intersort using differentiable sorting and ranking techniques.
Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models.
arXiv Detail & Related papers (2024-10-11T13:11:55Z) - Efficient SAGE Estimation via Causal Structure Learning [0.7243632426715939]
$d$-SAGE is a method that accelerates approximation to the Shapley Additive Global Importance (SAGE) value.
$d$-SAGE is motivated by the observation that conditional independencies between a feature and the model target imply zero surplus contributions.
We demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
arXiv Detail & Related papers (2023-04-06T14:40:32Z) - Minimax Optimal Online Imitation Learning via Replay Estimation [47.83919594113314]
We introduce a technique of replay estimation to reduce this empirical variance.
We show that our approach achieves the optimal $widetildeO left( min(H3/2 / N, H / sqrtN$)$ dependency.
arXiv Detail & Related papers (2022-05-30T19:29:56Z) - Improving Out-of-Distribution Robustness via Selective Augmentation [61.147630193060856]
Machine learning algorithms assume that training and test examples are drawn from the same distribution.
distribution shift is a common problem in real-world applications and can cause models to perform dramatically worse at test time.
We propose a mixup-based technique which learns invariant functions via selective augmentation called LISA.
arXiv Detail & Related papers (2022-01-02T05:58:33Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Efficient Consensus Model based on Proximal Gradient Method applied to
Convolutional Sparse Problems [2.335152769484957]
We derive and detail a theoretical analysis of an efficient consensus algorithm based on gradient proximal (PG) approach.
The proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.
arXiv Detail & Related papers (2020-11-19T20:52:48Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.