Related papers: Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

URL: http://arxiv.org/abs/2305.15786v3
Date: Mon, 28 Aug 2023 20:29:41 GMT
Title: Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Authors: Hilaf Hasson, Danielle C. Maddix, Yuyang Wang, Gaurav Gupta, Youngsuk Park
Abstract summary: We show that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Inspired by the theoretical analysis, we propose a particular family of stacked generalizations in the context of probabilistic forecasting.
Score: 14.037314994161378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

Related papers

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees. We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner. We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative. We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z)
Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning. Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z)
Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples. We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z)
Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound [15.557653926558638]
We investigate a counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. We instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk. The resulting majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight bounds.
arXiv Detail & Related papers (2021-06-23T16:57:23Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Beyond variance reduction: Understanding the true impact of baselines on policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates. We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.