Related papers: Evaluating representations by the complexity of learning low-loss predictors

Evaluating representations by the complexity of learning low-loss predictors

URL: http://arxiv.org/abs/2009.07368v2
Date: Fri, 5 Feb 2021 16:50:13 GMT
Title: Evaluating representations by the complexity of learning low-loss predictors
Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho
Abstract summary: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
Score: 55.94170724668857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to prior methods, which measure the amount of information about the optimal predictor that is present in a specific amount of data, our methods measure the amount of information needed from the data to recover an approximation of the optimal predictor up to a specified tolerance. We present a framework to compare these methods based on plotting the validation loss versus evaluation dataset size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length probes, correspond to slices and integrals along the data axis of the loss-data curve, while ours correspond to slices and integrals along the loss axis. We provide experiments on real data to compare the behavior of each of these methods over datasets of varying size along with a high performance open source library for representation evaluation at https://github.com/willwhitney/reprieve.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points. We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes. Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z)
On the Convergence of Loss and Uncertainty-based Active Learning Algorithms [3.506897386829711]
We investigate the convergence rates and data sample sizes required for training a machine learning model using a gradient descent (SGD) algorithm. We present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions.
arXiv Detail & Related papers (2023-12-21T15:22:07Z)
Linear Distance Metric Learning with Noisy Labels [7.326930455001404]
We show that even if the data is noisy, the ground truth linear metric can be learned with any precision. We present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in loss function and in parameters.
arXiv Detail & Related papers (2023-06-05T18:29:00Z)
Variance reduced Shapley value estimation for trustworthy data valuation [16.03510965397185]
We propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short) We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS.
arXiv Detail & Related papers (2022-10-30T13:04:52Z)
Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization. Unlike cross-validation, our approach avoids sacrificing data for a test set. We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Scalable Vector Gaussian Information Bottleneck [19.21005180893519]
We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation. We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
arXiv Detail & Related papers (2021-02-15T12:51:26Z)
Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights. We give efficient approximations of these quantities using a linearized network. We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.