Related papers: Regularized Neural Ensemblers

Regularized Neural Ensemblers

URL: http://arxiv.org/abs/2410.04520v2
Date: Mon, 23 Jun 2025 16:40:18 GMT
Title: Regularized Neural Ensemblers
Authors: Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka,
Abstract summary: In this study, we explore employing regularized neural networks as ensemble methods.<n>Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions.<n>We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
Score: 55.15643209328513
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembling often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we explore employing regularized neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions during the training. We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities. Our experiments showcase that the regularized neural ensemblers yield competitive results compared to strong baselines across several modalities such as computer vision, natural language processing, and tabular data.

Related papers

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
A Model-Based Approach to Imitation Learning through Multi-Step Predictions [8.888213496593556]
We present a novel model-based imitation learning framework inspired by model predictive control.<n>Our method outperforms traditional behavior cloning numerical benchmarks.<n>We provide theoretical guarantees on the sample complexity and error bounds of our method, offering insights into its convergence properties.
arXiv Detail & Related papers (2025-04-18T02:19:30Z)
U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms [4.871473117968554]
We propose an unsupervised model aggregation method, U-aggregation, for enhanced and robust performance in new populations. Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population. We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits.
arXiv Detail & Related papers (2025-01-30T01:42:51Z)
Regularization for Adversarial Robust Learning [18.46110328123008]
We develop a novel approach to adversarial training that integrates $phi$-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.
arXiv Detail & Related papers (2024-08-19T03:15:41Z)
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later [59.88557193062348]
We revisit the classic Neighborhood Component Analysis (NCA), designed to learn a linear projection that captures semantic similarities between instances. We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance. We also introduce a neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z)
Aggregated f-average Neural Network for Interpretable Ensembling [25.818919790407016]
We introduce an aggregated f-average (AFA) shallow neural network which models and combines different types of averages to perform an optimal aggregation of the weak learners predictions. We emphasise its interpretable architecture and simple training strategy, and illustrate its good performance on the problem of few-shot class incremental learning.
arXiv Detail & Related papers (2023-10-09T09:43:08Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
Sequential Bayesian Neural Subnetwork Ensembles [4.6354120722975125]
We propose an approach for sequential ensembling of dynamic Bayesian neuralworks that consistently maintains reduced model complexity throughout the training process. Our proposed approach outperforms traditional dense and sparse deterministic and Bayesian ensemble models in terms of prediction accuracy, uncertainty estimation, out-of-distribution detection, and adversarial robustness.
arXiv Detail & Related papers (2022-06-01T22:57:52Z)
Orthogonal Ensemble Networks for Biomedical Image Segmentation [10.011414604407681]
We introduce Orthogonal Ensemble Networks (OEN), a novel framework to explicitly enforce model diversity. We benchmark the proposed framework in two challenging brain lesion segmentation tasks. The experimental results show that our approach produces more robust and well-calibrated ensemble models.
arXiv Detail & Related papers (2021-05-22T23:44:55Z)
The Role of Isomorphism Classes in Multi-Relational Datasets [6.419762264544509]
We show that isomorphism leakage overestimates performance in multi-relational inference. We propose isomorphism-aware synthetic benchmarks for model evaluation. We also demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme.
arXiv Detail & Related papers (2020-09-30T12:15:24Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.