Related papers: HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble

HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble

URL: http://arxiv.org/abs/2301.07794v1
Date: Wed, 18 Jan 2023 21:47:05 GMT
Title: HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble
Authors: Jingchi Zhang, Huanrui Yang and Hai Li
Abstract summary: Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture. We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
Score: 22.065904428696353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensemble learning has gain attention in resent deep learning research as a way to further boost the accuracy and generalizability of deep neural network (DNN) models. Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture, which lead to significant burden on memory and computation cost of the ensemble model. Meanwhile, the heurtsically induced diversity may not lead to significant performance gain. We propose a new prespective on exploring the intrinsic diversity within a model architecture to build efficient DNN ensemble. We make an intriguing observation that pruning and quantization, while both leading to efficient model architecture at the cost of small accuracy drop, leads to distinct behavior in the decision boundary. To this end, we propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model. An diversity-aware training objective is proposed to further boost the performance of the HCE ensemble. Experiemnt result shows that HCE achieves significant improvement in the efficiency-accuracy tradeoff comparing to both traditional DNN ensemble training methods and previous model compression methods.

Related papers

Enhancing Time Series Classification with Diversity-Driven Neural Network Ensembles [0.776514389034479]
We introduce a diversity-driven ensemble learning framework that explicitly encourages feature diversity among neural network ensemble members.<n>We evaluate our framework on 128 datasets from the UCR archive and show that it achieves SOTA performance with fewer models.
arXiv Detail & Related papers (2026-02-07T15:05:04Z)
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts [113.0656076371565]
We propose a novel router-aware approach to optimize importance sampling weights in off-policy reinforcement learning (RL)<n>Specifically, we design a rescaling strategy guided by router logits, which effectively reduces gradient variance and mitigates training divergence.<n> Experimental results demonstrate that our method significantly improves both the convergence stability and the final performance of MoE models.
arXiv Detail & Related papers (2025-10-27T05:47:48Z)
Enhancing Graph Neural Networks: A Mutual Learning Approach [3.3557736371930567]
This study explores the potential of collaborative learning among graph neural networks (GNNs)<n>In the absence of a pre-trained teacher model, we show that relatively simple and shallow GNN architectures can synergetically learn efficient models.<n>We propose a collaborative learning framework where ensembles of student GNNs mutually teach each other throughout the training process.
arXiv Detail & Related papers (2025-10-22T04:07:48Z)
HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling [52.58723853697152]
We propose a Hybrid Architecture Distillation (HAD) approach for DNA sequence modeling.<n>We employ the NTv2-500M as the teacher model and devise a grouping masking strategy.<n>Compared to models with similar parameters, our model achieved excellent performance.
arXiv Detail & Related papers (2025-05-27T07:57:35Z)
Self Distillation via Iterative Constructive Perturbations [0.2748831616311481]
We propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training.<n>By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization.
arXiv Detail & Related papers (2025-05-20T13:15:27Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Deep Companion Learning: Enhancing Generalization Through Historical Consistency [35.5237083057451]
We propose a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions. We train a deep-companion model (DCM) by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision.
arXiv Detail & Related papers (2024-07-26T15:31:13Z)
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later [59.88557193062348]
We revisit the classic Neighborhood Component Analysis (NCA), designed to learn a linear projection that captures semantic similarities between instances. We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance. We also introduce a neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND) Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model. Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z)
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning [10.784911682565879]
Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. We present a novel self-supervised training regime that leverages an ensemble of independent sub-networks. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty.
arXiv Detail & Related papers (2023-08-28T16:58:44Z)
Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z)
Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions. We propose deep negative correlation classification (DNCC) DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z)
Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks [2.167843405313757]
We re-define the efficiency measure using a multi-objective optimization. We combine competing variables with nature simultaneously in a single relative efficiency measure. This allows to rank deep models that run efficiently on different computing hardware, and combines inference efficiency with training efficiency objectively.
arXiv Detail & Related papers (2022-02-18T15:58:17Z)
Collegial Ensembles [11.64359837358763]
We show that collegial ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers. We also show how our framework can be used to analytically derive optimal group convolution modules without having to train a single model.
arXiv Detail & Related papers (2020-06-13T16:40:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.