How Many and Which Training Points Would Need to be Removed to Flip this
Prediction?
- URL: http://arxiv.org/abs/2302.02169v1
- Date: Sat, 4 Feb 2023 13:55:12 GMT
- Title: How Many and Which Training Points Would Need to be Removed to Flip this
Prediction?
- Authors: Jinghan Yang, Sarthak Jain, Byron C. Wallace
- Abstract summary: We consider the problem of identifying a minimal subset of training data $mathcalS_t$.
If the instances comprising $mathcalS_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different.
We propose comparatively fast approximation methods to find $mathcalS_t$ based on influence functions.
- Score: 34.9118528281516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of identifying a minimal subset of training data
$\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been
removed prior to training, the categorization of a given test point $x_t$ would
have been different. Identifying such a set may be of interest for a few
reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of
robustness (if $|\mathcal{S}_t|$ is small for $x_t$, we might be less confident
in the corresponding prediction), which we show is correlated with but
complementary to predicted probabilities. Second, interrogation of
$\mathcal{S}_t$ may provide a novel mechanism for contesting a particular model
prediction: If one can make the case that the points in $\mathcal{S}_t$ are
wrongly labeled or irrelevant, this may argue for overturning the associated
prediction. Identifying $\mathcal{S}_t$ via brute-force is intractable. We
propose comparatively fast approximation methods to find $\mathcal{S}_t$ based
on influence functions, and find that -- for simple convex text classification
models -- these approaches can often successfully identify relatively small
sets of training examples which, if removed, would flip the prediction. To our
knowledge, this is the first work in to investigate the problem of identifying
a minimal training set necessary to flip a given prediction in the context of
machine learning.
Related papers
- Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions.
In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns.
We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z) - Revisiting Agnostic PAC Learning [30.67561230812141]
PAC learning, dating back to Valiant'84 and Vapnik and Chervonenkis'64,'74, is a classic model for studying supervised learning.
Empirical Risk Minimization (ERM) is a natural learning algorithm, where one simply outputs the hypothesis from $mathcalH$ making the fewest mistakes on the training data.
We revisit PAC learning and first show that ERM is in fact sub-optimal if we treat the performance of the best hypothesis, denoted $tau:=Pr_mathcalD[hstar_math
arXiv Detail & Related papers (2024-07-29T08:20:49Z) - Relabeling Minimal Training Subset to Flip a Prediction [20.708004593740004]
We find that relabeling fewer than 2% of the training points can always flip a prediction.
We show that $|mathcalS_t|$ is highly related to the noise ratio in the training set and $|mathcalS_t|$ is correlated with but complementary to predicted probabilities.
arXiv Detail & Related papers (2023-05-22T08:10:43Z) - HappyMap: A Generalized Multi-calibration Method [23.086009024383024]
Multi-calibration is a powerful and evolving concept originating in the field of algorithmic fairness.
In this work, we view the term $(f(x)-y)$ as just one specific mapping, and explore the power of an enriched class of mappings.
We propose textitHappyMap, a generalization of multi-calibration, which yields a wide range of new applications.
arXiv Detail & Related papers (2023-03-08T05:05:01Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary.
Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z) - Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$.
We study this as agnostic learning with unknown utilities.
We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z) - Proving Non-Inclusion of B\"uchi Automata based on Monte Carlo Sampling [19.09848789158933]
We present a specific algorithm for proving B"uchi automata non-inclusion $mathcalL(mathcalA) notsubseteq mathcalL(mathcalB)$.
$mathsfIMC2$ is a fast and reliable way to find counterexamples to B"uchi automata inclusion.
arXiv Detail & Related papers (2020-07-05T10:17:02Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.