Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
- URL: http://arxiv.org/abs/2603.04204v1
- Date: Wed, 04 Mar 2026 15:48:44 GMT
- Title: Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
- Authors: Raphaël Razafindralambo, Rémy Sun, Frédéric Precioso, Damien Garreau, Pierre-Alexandre Mattei,
- Abstract summary: Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble.<n>We study the normalized generalized mean of order $r in mathbbR cup -infty,+infty$ through the lens of log-likelihood.<n>This provides a unifying aggregation formalism and shows different optimal configurations for different situations.
- Score: 22.019987128734282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Anisotropic local law for non-separable sample covariance matrices [10.181748307494608]
We establish local laws for sample covariance matrices $K = N-1sum_i=1N g_ig_ig_i*$ where the random vectors $g_1, ldots, g_N in Rn$ are independent with common covariance $$.<n>We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $g = (Xw)$ arising in machine learning, and Gaussian measures with
arXiv Detail & Related papers (2026-02-20T03:28:51Z) - Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z) - Almost Asymptotically Optimal Active Clustering Through Pairwise Observations [59.20614082241528]
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses.<n>We establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the accuracy of the clustering.<n>We develop a computationally feasible variant of the Generalized Likelihood Ratio statistic and show that its performance gap to the lower bound can be accurately empirically estimated.
arXiv Detail & Related papers (2026-02-05T14:16:47Z) - Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces [0.0]
We study uniform random sampling on the $d$-dimensional unit hypercube.<n>We derive a sample complexity bound with a logarithmic dependence on the failure probability.<n>Our findings offer a sharper theoretical tool for algorithms that rely on grid-based coverage guarantees.
arXiv Detail & Related papers (2025-11-21T21:06:14Z) - Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z) - Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers [24.88026399458157]
Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur.
The key to secure machines in distributed learning is resilient aggregation mechanisms.
arXiv Detail & Related papers (2023-12-20T08:36:55Z) - Clustering Mixtures of Bounded Covariance Distributions Under Optimal
Separation [44.25945344950543]
We study the clustering problem for mixtures of bounded covariance distributions.
We give the first poly-time algorithm for this clustering task.
Our algorithm is robust to $Omega(alpha)$-fraction of adversarial outliers.
arXiv Detail & Related papers (2023-12-19T01:01:53Z) - Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP)
We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z) - Asymptotic Theory of Geometric and Adaptive $k$-Means Clustering [0.0]
We revisit Pollard's classical result on consistency for $k$-means clustering in Euclidean space.<n>We show that all clustering procedures described above are strongly consistent.
arXiv Detail & Related papers (2022-02-27T18:56:41Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.