Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data
Shuffling For SGD
- URL: http://arxiv.org/abs/2309.01640v1
- Date: Mon, 4 Sep 2023 14:49:27 GMT
- Title: Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data
Shuffling For SGD
- Authors: Etay Livne, Gal Kaplun, Eran Malach Shai, Shalev-Schwatz
- Abstract summary: We introduce a novel partial data shuffling strategy for Gradient Descent (SGD)
It combines an offline iteration of the CorgiPile method with a subsequent online iteration.
Our approach enjoys the best of both worlds: it performs similarly to SGD with random access (even for homogenous data) without compromising the data access efficiency of CorgiPile.
- Score: 5.691144886263981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When using Stochastic Gradient Descent (SGD) for training machine learning
models, it is often crucial to provide the model with examples sampled at
random from the dataset. However, for large datasets stored in the cloud,
random access to individual examples is often costly and inefficient. A recent
work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile,
which greatly improves efficiency of data access, at the cost some performance
loss, which is particularly apparent for large datasets stored in homogeneous
shards (e.g., video datasets). In this paper, we introduce a novel two-step
partial data shuffling strategy for SGD which combines an offline iteration of
the CorgiPile method with a subsequent online iteration. Our approach enjoys
the best of both worlds: it performs similarly to SGD with random access (even
for homogenous data) without compromising the data access efficiency of
CorgiPile. We provide a comprehensive theoretical analysis of the convergence
properties of our method and demonstrate its practical advantages through
experimental results.
Related papers
- Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Okapi: Generalising Better by Making Statistical Matches Match [7.392460712829188]
Okapi is a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching.
Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss.
We show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation.
arXiv Detail & Related papers (2022-11-07T12:41:17Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - Stochastic Gradient Descent without Full Data Shuffle [65.97105896033815]
CorgiPile is a hierarchical data shuffling strategy that avoids a full data shuffle while maintaining comparable convergence rate of SGD as if a full shuffle were performed.
Our results show that CorgiPile can achieve comparable convergence rate with the full shuffle based SGD for both deep learning and generalized linear models.
arXiv Detail & Related papers (2022-06-12T20:04:31Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.