Deconstructing Distributions: A Pointwise Framework of Learning
- URL: http://arxiv.org/abs/2202.09931v1
- Date: Sun, 20 Feb 2022 23:25:28 GMT
- Title: Deconstructing Distributions: A Pointwise Framework of Learning
- Authors: Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
- Abstract summary: We study a point's $textitprofile$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point.
We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution.
- Score: 15.517383696434162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, we traditionally evaluate the performance of a single
model, averaged over a collection of test inputs. In this work, we propose a
new approach: we measure the performance of a collection of models when
evaluated on a $\textit{single input point}$. Specifically, we study a point's
$\textit{profile}$: the relationship between models' average performance on the
test distribution and their pointwise performance on this individual point. We
find that profiles can yield new insights into the structure of both models and
data -- in and out-of-distribution. For example, we empirically show that real
data distributions consist of points with qualitatively different profiles. On
one hand, there are "compatible" points with strong correlation between the
pointwise and average performance. On the other hand, there are points with
weak and even $\textit{negative}$ correlation: cases where improving overall
model accuracy actually $\textit{hurts}$ performance on these inputs. We prove
that these experimental observations are inconsistent with the predictions of
several simplified models of learning proposed in prior work. As an
application, we use profiles to construct a dataset we call CIFAR-10-NEG: a
subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is
$\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This
illustrates, for the first time, an OOD dataset that completely inverts
"accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar,
Liang, Carmon, and Schmidt 2021)
Related papers
- An Interpretable Evaluation of Entropy-based Novelty of Generative Models [36.29214321258605]
We propose a Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of generative models.
We present numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes.
arXiv Detail & Related papers (2024-02-27T08:00:52Z) - Score Mismatching for Generative Modeling [4.413162309652114]
We propose a new score-based model with one-step sampling.
We train a standalone generator to compress all the time steps with the gradient backpropagated from the score network.
In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution.
arXiv Detail & Related papers (2023-09-20T03:47:12Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - K-means Clustering Based Feature Consistency Alignment for Label-free
Model Evaluation [12.295565506212844]
This paper presents our solutions for the 1st DataCV Challenge of the Visual Understanding dataset workshop at CVPR 2023.
Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets.
Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy.
Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models.
arXiv Detail & Related papers (2023-04-17T06:33:30Z) - Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time.
We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.