Data Sampling Affects the Complexity of Online SGD over Dependent Data
- URL: http://arxiv.org/abs/2204.00006v1
- Date: Thu, 31 Mar 2022 07:48:30 GMT
- Title: Data Sampling Affects the Complexity of Online SGD over Dependent Data
- Authors: Shaocong Ma, Ziyi Chen, Yi Zhou, Kaiyi Ji, Yingbin Liang
- Abstract summary: We show how different data sampling schemes affect the sample complexity of online gradient descent over highly dependent data.
Even subsampling a subset of data samples can accelerate the convergence of online SGD over highly dependent data.
- Score: 54.92366535993012
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Conventional machine learning applications typically assume that data samples
are independently and identically distributed (i.i.d.). However, practical
scenarios often involve a data-generating process that produces highly
dependent data samples, which are known to heavily bias the stochastic
optimization process and slow down the convergence of learning. In this paper,
we conduct a fundamental study on how different stochastic data sampling
schemes affect the sample complexity of online stochastic gradient descent
(SGD) over highly dependent data. Specifically, with a $\phi$-mixing model of
data dependence, we show that online SGD with proper periodic data-subsampling
achieves an improved sample complexity over the standard online SGD in the full
spectrum of the data dependence level. Interestingly, even subsampling a subset
of data samples can accelerate the convergence of online SGD over highly
dependent data. Moreover, we show that online SGD with mini-batch sampling can
further substantially improve the sample complexity over online SGD with
periodic data-subsampling over highly dependent data. Numerical experiments
validate our theoretical results.
Related papers
- RPS: A Generic Reservoir Patterns Sampler [1.09784964592609]
We introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data.
We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets.
arXiv Detail & Related papers (2024-10-31T16:25:21Z) - Stochastic Gradient Descent with Adaptive Data [4.119418481809095]
gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios.
Applying SGD to policy optimization problems in operations research involves a distinct challenge: the policy changes the environment and thereby affects the data used to update the policy.
The influence of previous decisions on the data generated introduces bias in the gradient estimate, which presents a potential source of instability for online learning not present in the iid case.
We show that the convergence speed of SGD with adaptive data is largely similar to the classical iid setting, as long as the mixing time of the policy-induced dynamics is factored in.
arXiv Detail & Related papers (2024-10-02T02:58:32Z) - Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.
We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.
We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Diverse Sample Generation: Pushing the Limit of Data-free Quantization [85.95032037447454]
This paper presents a generic Diverse Sample Generation scheme for the generative data-free post-training quantization and quantization-aware training.
For large-scale image classification tasks, our DSG can consistently outperform existing data-free quantization methods.
arXiv Detail & Related papers (2021-09-01T07:06:44Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - Data-Free Network Quantization With Adversarial Knowledge Distillation [39.92282726292386]
In this paper, we consider data-free network quantization with synthetic data.
The synthetic data are generated from a generator, while no data are used in training the generator and in quantization.
We show the gain of producing diverse adversarial samples by using multiple generators and multiple students.
arXiv Detail & Related papers (2020-05-08T16:24:55Z) - Progressive Growing of Neural ODEs [7.558546277131641]
We propose a progressive learning paradigm of NODEs for long-term time series forecasting.
Specifically, following the principle of curriculum learning, we gradually increase the complexity of data and network capacity as training progresses.
Our experiments with both synthetic data and real traffic data (PeMS Bay Area traffic data) show that our training methodology consistently improves the performance of vanilla NODEs by over 64%.
arXiv Detail & Related papers (2020-03-08T01:15:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.