Data pruning and neural scaling laws: fundamental limitations of
score-based algorithms
- URL: http://arxiv.org/abs/2302.06960v3
- Date: Mon, 6 Nov 2023 07:40:20 GMT
- Title: Data pruning and neural scaling laws: fundamental limitations of
score-based algorithms
- Authors: Fadhel Ayed and Soufiane Hayou
- Abstract summary: We show theoretically and empirically why score-based data pruning algorithms fail in the high compression regime.
We present calibration protocols that enhance the performance of existing pruning algorithms in this high compression regime.
- Score: 9.68145635795782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data pruning algorithms are commonly used to reduce the memory and
computational cost of the optimization process. Recent empirical results reveal
that random data pruning remains a strong baseline and outperforms most
existing data pruning methods in the high compression regime, i.e., where a
fraction of $30\%$ or less of the data is kept. This regime has recently
attracted a lot of interest as a result of the role of data pruning in
improving the so-called neural scaling laws; in [Sorscher et al.], the authors
showed the need for high-quality data pruning algorithms in order to beat the
sample power law.
In this work, we focus on score-based data pruning algorithms and show
theoretically and empirically why such algorithms fail in the high compression
regime. We demonstrate ``No Free Lunch" theorems for data pruning and present
calibration protocols that enhance the performance of existing pruning
algorithms in this high compression regime using randomization.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - DRoP: Distributionally Robust Pruning [11.930434318557156]
We conduct the first systematic study of the impact of data pruning on classification bias of trained models.
We propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks.
arXiv Detail & Related papers (2024-04-08T14:55:35Z) - Sketch and shift: a robust decoder for compressive clustering [17.627195350266796]
Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning.
We propose an alternative decoder offering substantial improvements over CL-OMPR.
The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.
arXiv Detail & Related papers (2023-12-15T16:53:55Z) - Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization.
We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably.
Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z) - Geometry-Aware Approaches for Balancing Performance and Theoretical
Guarantees in Linear Bandits [6.907555940790131]
Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds.
We propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid.
We identify and course-correct" problem instances in which the base algorithms perform poorly.
arXiv Detail & Related papers (2023-06-26T17:38:45Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Data-Efficient Structured Pruning via Submodular Optimization [32.574190896543705]
We propose a data-efficient structured pruning method based on submodular optimization.
We show that this selection problem is a weakly submodular problem, thus it can be provably approximated using an efficient greedy algorithm.
Our method is one of the few in the literature that uses only a limited-number of training data and no labels.
arXiv Detail & Related papers (2022-03-09T18:40:29Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.