On Joint Regularization and Calibration in Deep Ensembles
- URL: http://arxiv.org/abs/2511.04160v2
- Date: Fri, 07 Nov 2025 02:19:40 GMT
- Title: On Joint Regularization and Calibration in Deep Ensembles
- Authors: Laurits Fredsgaard, Mikkel N. Schmidt,
- Abstract summary: Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration.<n>This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification.<n>We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution.
- Score: 4.042929081555184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap
Related papers
- An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z) - How does the optimizer implicitly bias the model merging loss landscape? [66.96572894292895]
We show that a single quantity -- the effective noise scale -- unifies the impact of inference and data choices on model merging.<n>Across datasets, the effectiveness of merging success is a non-monotonic function of effective noise, with a distinct optimum.
arXiv Detail & Related papers (2025-10-06T10:56:41Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - On Uniform, Bayesian, and PAC-Bayesian Deep Ensembles [7.883369697332076]
This study reviews the argument that neither the sampling nor the weighting in Bayes ensembles are particularly well suited for increasing generalization performance.<n>We show that a weighted average of models, where the weights are optimized by minimizing a second-order PAC-Bayesian generalization bound, can improve generalization.
arXiv Detail & Related papers (2024-06-08T13:19:18Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Improving Predictive Performance and Calibration by Weight Fusion in
Semantic Segmentation [18.47581580698701]
Averaging predictions of a deep ensemble of networks is a popular and effective method to improve predictive performance andcalibration.
We show that a simple weight fusion (WF)strategy can lead to a significantly improved predictive performance andcalibration.
arXiv Detail & Related papers (2022-07-22T17:24:13Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Unleashing the Power of Contrastive Self-Supervised Visual Models via
Contrast-Regularized Fine-Tuning [94.35586521144117]
We investigate whether applying contrastive learning to fine-tuning would bring further benefits.
We propose Contrast-regularized tuning (Core-tuning), a novel approach for fine-tuning contrastive self-supervised visual models.
arXiv Detail & Related papers (2021-02-12T16:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.