Related papers: A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure

A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure

URL: http://arxiv.org/abs/2512.22286v1
Date: Thu, 25 Dec 2025 08:51:01 GMT
Title: A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure
Authors: Ernest Fokoué,
Abstract summary: This paper develops a general weighting theory for ensemble learning.<n>We formalize ensembles as linear operators acting on a hypothesis space.<n>We show how non-uniform, structured weights can outperform uniform averaging.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensemble learning is traditionally justified as a variance-reduction strategy, explaining its strong performance for unstable predictors such as decision trees. This explanation, however, does not account for ensembles constructed from intrinsically stable estimators-including smoothing splines, kernel ridge regression, Gaussian process regression, and other regularized reproducing kernel Hilbert space (RKHS) methods whose variance is already tightly controlled by regularization and spectral shrinkage. This paper develops a general weighting theory for ensemble learning that moves beyond classical variance-reduction arguments. We formalize ensembles as linear operators acting on a hypothesis space and endow the space of weighting sequences with geometric and spectral constraints. Within this framework, we derive a refined bias-variance approximation decomposition showing how non-uniform, structured weights can outperform uniform averaging by reshaping approximation geometry and redistributing spectral complexity, even when variance reduction is negligible. Our main results provide conditions under which structured weighting provably dominates uniform ensembles, and show that optimal weights arise as solutions to constrained quadratic programs. Classical averaging, stacking, and recently proposed Fibonacci-based ensembles appear as special cases of this unified theory, which further accommodates geometric, sub-exponential, and heavy-tailed weighting laws. Overall, the work establishes a principled foundation for structure-driven ensemble learning, explaining why ensembles remain effective for smooth, low-variance base learners and setting the stage for distribution-adaptive and dynamically evolving weighting schemes developed in subsequent work.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation [56.361076943802594]
CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
arXiv Detail & Related papers (2026-02-16T18:58:55Z)
Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds [63.47271262149291]
We propose a unified framework for PAC-Bayesian norm-based generalization.<n>The key to our approach is a sensitivity matrix that quantifies the network outputs with respect to structured weight perturbations.<n>We derive a family of generalization bounds that recover several existing PAC-Bayesian results as special cases.
arXiv Detail & Related papers (2026-01-13T00:42:22Z)
An Equivariance Toolbox for Learning Dynamics [13.651450618432094]
We develop a general equivariance toolbox that yields coupled first- and second-order constraints on learning dynamics.<n>At the first order, our framework unifies conservation laws and implicit-bias relations as special cases of a single identity.<n>At the second order, it provides structural predictions about curvature.
arXiv Detail & Related papers (2025-12-24T23:42:07Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Spectral Algorithms in Misspecified Regression: Convergence under Covariate Shift [0.2578242050187029]
spectral algorithms are a class of regularization methods originating from inverse problems.<n>In this paper, we investigate the convergence properties of spectral algorithms under covariate shift.<n>We provide a theoretical analysis of the more challenging misspecified case, in which the target function does not belong to the kernel reproducing Hilbert space (RKHS)
arXiv Detail & Related papers (2025-09-05T13:42:27Z)
A Mean-Field Theory of $Θ$-Expectations [2.1756081703276]
We develop a new class of calculus for such non-linear models.<n>Theta-Expectation is shown to be consistent with the axiom of subaddivity.
arXiv Detail & Related papers (2025-07-30T11:08:56Z)
Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime [6.645111950779666]
This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators.<n>We present a unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter.
arXiv Detail & Related papers (2023-06-19T14:07:10Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)
Towards Understanding Generalization via Decomposing Excess Risk Dynamics [13.4379473119565]
We analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics. Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.
arXiv Detail & Related papers (2021-06-11T03:42:45Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.