Deconstructing Distributions: A Pointwise Framework of Learning
- URL: http://arxiv.org/abs/2202.09931v1
- Date: Sun, 20 Feb 2022 23:25:28 GMT
- Title: Deconstructing Distributions: A Pointwise Framework of Learning
- Authors: Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
- Abstract summary: We study a point's $textitprofile$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point.
We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution.
- Score: 15.517383696434162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, we traditionally evaluate the performance of a single
model, averaged over a collection of test inputs. In this work, we propose a
new approach: we measure the performance of a collection of models when
evaluated on a $\textit{single input point}$. Specifically, we study a point's
$\textit{profile}$: the relationship between models' average performance on the
test distribution and their pointwise performance on this individual point. We
find that profiles can yield new insights into the structure of both models and
data -- in and out-of-distribution. For example, we empirically show that real
data distributions consist of points with qualitatively different profiles. On
one hand, there are "compatible" points with strong correlation between the
pointwise and average performance. On the other hand, there are points with
weak and even $\textit{negative}$ correlation: cases where improving overall
model accuracy actually $\textit{hurts}$ performance on these inputs. We prove
that these experimental observations are inconsistent with the predictions of
several simplified models of learning proposed in prior work. As an
application, we use profiles to construct a dataset we call CIFAR-10-NEG: a
subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is
$\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This
illustrates, for the first time, an OOD dataset that completely inverts
"accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar,
Liang, Carmon, and Schmidt 2021)
Related papers
- Generalization is not a universal guarantee: Estimating similarity to training data with an ensemble out-of-distribution metric [0.09363323206192666]
Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems.
We propose a standardized approach for assessing data similarity by constructing a supervised autoencoder for generalizability estimation (SAGE)
We show that out-of-the-box model performance increases after SAGE score filtering, even when applied to data from the model's own training and test datasets.
arXiv Detail & Related papers (2025-02-22T19:21:50Z) - Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights [11.237906163959908]
Multimodal models are trained on large-scale web-crawled datasets.<n>These datasets often contain noise, bias, and irrelevant information.<n>We propose an efficient, model-based approach using the Mimic Score.
arXiv Detail & Related papers (2025-01-12T04:28:14Z) - An Interpretable Evaluation of Entropy-based Novelty of Generative Models [36.29214321258605]
We propose a Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of generative models.
We present numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes.
arXiv Detail & Related papers (2024-02-27T08:00:52Z) - Score Mismatching for Generative Modeling [4.413162309652114]
We propose a new score-based model with one-step sampling.
We train a standalone generator to compress all the time steps with the gradient backpropagated from the score network.
In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution.
arXiv Detail & Related papers (2023-09-20T03:47:12Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Think Twice: Measuring the Efficiency of Eliminating Prediction
Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature.
We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA)
We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z) - K-means Clustering Based Feature Consistency Alignment for Label-free
Model Evaluation [12.295565506212844]
This paper presents our solutions for the 1st DataCV Challenge of the Visual Understanding dataset workshop at CVPR 2023.
Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets.
Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy.
Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models.
arXiv Detail & Related papers (2023-04-17T06:33:30Z) - Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time.
We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.