Variance Reduction in Training Forecasting Models with Subgroup Sampling
- URL: http://arxiv.org/abs/2103.02062v1
- Date: Tue, 2 Mar 2021 22:23:27 GMT
- Title: Variance Reduction in Training Forecasting Models with Subgroup Sampling
- Authors: Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa,
Dean Foster
- Abstract summary: We show a forecasting model with commonly used gradients (e.g. SGD) potentially suffers large variance and thus requires long time training.
To alleviate this issue, we propose sampling strategy named Subgroup Sampling.
We show SCott converges faster with respect to both gradient and wall clock objectives.
- Score: 34.941630385114216
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In real-world applications of large-scale time series, one often encounters
the situation where the temporal patterns of time series, while drifting over
time, differ from one another in the same dataset. In this paper, we provably
show under such heterogeneity, training a forecasting model with commonly used
stochastic optimizers (e.g. SGD) potentially suffers large gradient variance,
and thus requires long time training. To alleviate this issue, we propose a
sampling strategy named Subgroup Sampling, which mitigates the large variance
via sampling over pre-grouped time series. We further introduce SCott, a
variance reduced SGD-style optimizer that co-designs subgroup sampling with the
control variate method. In theory, we provide the convergence guarantee of
SCott on smooth non-convex objectives. Empirically, we evaluate SCott and other
baseline optimizers on both synthetic and real-world time series forecasting
problems, and show SCott converges faster with respect to both iterations and
wall clock time. Additionally, we show two SCott variants that can speed up
Adam and Adagrad without compromising generalization of forecasting models.
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - VE: Modeling Multivariate Time Series Correlation with Variate Embedding [0.4893345190925178]
Current channel-independent (CI) models and models with a CI final projection layer are unable to capture correlations.
We present the variate embedding (VE) pipeline, which learns a unique and consistent embedding for each variate.
The VE pipeline can be integrated into any model with a CI final projection layer to improve multivariate forecasting.
arXiv Detail & Related papers (2024-09-10T02:49:30Z) - Addressing Distribution Shift in Time Series Forecasting with Instance
Normalization Flows [36.956983415564274]
We propose a general decoupled formulation for time series forecasting.
We make such a formulation formalized into a bi-level optimization problem.
Our method consistently outperforms state-of-the-art baselines on both synthetic and real-world data.
arXiv Detail & Related papers (2024-01-30T06:35:52Z) - Compatible Transformer for Irregularly Sampled Multivariate Time Series [75.79309862085303]
We propose a transformer-based encoder to achieve comprehensive temporal-interaction feature learning for each individual sample.
We conduct extensive experiments on 3 real-world datasets and validate that the proposed CoFormer significantly and consistently outperforms existing methods.
arXiv Detail & Related papers (2023-10-17T06:29:09Z) - Improving Forecasts for Heterogeneous Time Series by "Averaging", with
Application to Food Demand Forecast [0.609170287691728]
This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion.
Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting.
arXiv Detail & Related papers (2023-06-12T13:52:30Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z) - Imputing Missing Observations with Time Sliced Synthetic Minority
Oversampling Technique [0.3973560285628012]
We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set.
We fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time.
This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features.
arXiv Detail & Related papers (2022-01-14T19:23:24Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.