Related papers: A hierarchical decomposition for explaining ML performance discrepancies

A hierarchical decomposition for explaining ML performance discrepancies

URL: http://arxiv.org/abs/2402.14254v1
Date: Thu, 22 Feb 2024 03:41:05 GMT
Title: A hierarchical decomposition for explaining ML performance discrepancies
Authors: Jean Feng, Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej Gossmann
Abstract summary: Machine learning algorithms can often differ in performance across domains. Understanding $textitwhy$ their performance differs is crucial for determining what types of interventions are most effective at closing the performance gaps. We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains without requiring causal knowledge.
Score: 6.603088808962966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) algorithms can often differ in performance across domains. Understanding $\textit{why}$ their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on $\textit{aggregate decompositions}$ of the total performance gap into the impact of a shift in the distribution of features $p(X)$ versus the impact of a shift in the conditional distribution of the outcome $p(Y|X)$; however, such coarse explanations offer only a few options for how one can close the performance gap. $\textit{Detailed variable-level decompositions}$ that quantify the importance of each variable to each term in the aggregate decomposition can provide a much deeper understanding and suggest much more targeted interventions. However, existing methods assume knowledge of the full causal graph or make strong parametric assumptions. We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains, without requiring causal knowledge. We derive debiased, computationally-efficient estimators, and statistical inference procedures for asymptotically valid confidence intervals.

Related papers

NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z)
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z)
Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC. We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z)
S-CFE: Simple Counterfactual Explanations [21.975560789792073]
We tackle the problem of finding manifold-aligned counterfactual explanations for sparse data. Our approach effectively produces sparse, manifold-aligned counterfactual explanations.
arXiv Detail & Related papers (2024-10-21T07:42:43Z)
Efficient Differentiable Discovery of Causal Order [14.980926991441342]
Intersort is a score-based method to discover causal order of variables. We reformulate Intersort using differentiable sorting and ranking techniques. Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models.
arXiv Detail & Related papers (2024-10-11T13:11:55Z)
DCILP: A Distributed Approach for Large-Scale Causal Structure Learning [4.551283926017506]
Causal learning tackles the computationally demanding task of estimating causal graphs. This paper introduces a new divide-and-conquer approach for causal graph learning, called DCILP.
arXiv Detail & Related papers (2024-06-15T03:17:48Z)
Efficient SAGE Estimation via Causal Structure Learning [0.7243632426715939]
$d$-SAGE is a method that accelerates approximation to the Shapley Additive Global Importance (SAGE) value. $d$-SAGE is motivated by the observation that conditional independencies between a feature and the model target imply zero surplus contributions. We demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
arXiv Detail & Related papers (2023-04-06T14:40:32Z)
Minimax Optimal Online Imitation Learning via Replay Estimation [47.83919594113314]
We introduce a technique of replay estimation to reduce this empirical variance. We show that our approach achieves the optimal $widetildeO left( min(H3/2 / N, H / sqrtN$)$ dependency.
arXiv Detail & Related papers (2022-05-30T19:29:56Z)
Improving Out-of-Distribution Robustness via Selective Augmentation [61.147630193060856]
Machine learning algorithms assume that training and test examples are drawn from the same distribution. distribution shift is a common problem in real-world applications and can cause models to perform dramatically worse at test time. We propose a mixup-based technique which learns invariant functions via selective augmentation called LISA.
arXiv Detail & Related papers (2022-01-02T05:58:33Z)
Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Efficient Consensus Model based on Proximal Gradient Method applied to Convolutional Sparse Problems [2.335152769484957]
We derive and detail a theoretical analysis of an efficient consensus algorithm based on gradient proximal (PG) approach. The proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.
arXiv Detail & Related papers (2020-11-19T20:52:48Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant. Our approach relies on perturbeds, and can be used readily together with existing solvers. We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs. This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)
Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees. We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.