Evaluating representations by the complexity of learning low-loss
predictors
- URL: http://arxiv.org/abs/2009.07368v2
- Date: Fri, 5 Feb 2021 16:50:13 GMT
- Title: Evaluating representations by the complexity of learning low-loss
predictors
- Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar,
Kyunghyun Cho
- Abstract summary: We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
- Score: 55.94170724668857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of evaluating representations of data for use in
solving a downstream task. We propose to measure the quality of a
representation by the complexity of learning a predictor on top of the
representation that achieves low loss on a task of interest, and introduce two
methods, surplus description length (SDL) and $\varepsilon$ sample complexity
($\varepsilon$SC). In contrast to prior methods, which measure the amount of
information about the optimal predictor that is present in a specific amount of
data, our methods measure the amount of information needed from the data to
recover an approximation of the optimal predictor up to a specified tolerance.
We present a framework to compare these methods based on plotting the
validation loss versus evaluation dataset size (the "loss-data" curve).
Existing measures, such as mutual information and minimum description length
probes, correspond to slices and integrals along the data axis of the loss-data
curve, while ours correspond to slices and integrals along the loss axis. We
provide experiments on real data to compare the behavior of each of these
methods over datasets of varying size along with a high performance open source
library for representation evaluation at
https://github.com/willwhitney/reprieve.
Related papers
- Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - On the Convergence of Loss and Uncertainty-based Active Learning Algorithms [3.506897386829711]
We investigate the convergence rates and data sample sizes required for training a machine learning model using a gradient descent (SGD) algorithm.
We present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions.
arXiv Detail & Related papers (2023-12-21T15:22:07Z) - Linear Distance Metric Learning with Noisy Labels [7.326930455001404]
We show that even if the data is noisy, the ground truth linear metric can be learned with any precision.
We present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in loss function and in parameters.
arXiv Detail & Related papers (2023-06-05T18:29:00Z) - Variance reduced Shapley value estimation for trustworthy data valuation [16.03510965397185]
We propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short)
We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS.
arXiv Detail & Related papers (2022-10-30T13:04:52Z) - Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.
Unlike cross-validation, our approach avoids sacrificing data for a test set.
We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Scalable Vector Gaussian Information Bottleneck [19.21005180893519]
We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation.
We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
arXiv Detail & Related papers (2021-02-15T12:51:26Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.