Related papers: Exploring Synergistic Ensemble Learning: Uniting CNNs, MLP-Mixers, and Vision Transformers to Enhance Image Classification

Exploring Synergistic Ensemble Learning: Uniting CNNs, MLP-Mixers, and Vision Transformers to Enhance Image Classification

URL: http://arxiv.org/abs/2504.09076v1
Date: Sat, 12 Apr 2025 04:32:52 GMT
Title: Exploring Synergistic Ensemble Learning: Uniting CNNs, MLP-Mixers, and Vision Transformers to Enhance Image Classification
Authors: Mk Bashar, Ocean Monjur, Samia Islam, Mohammad Galib Shams, Niamul Quader,
Abstract summary: We build upon and improve previous work exploring the complementarity between different architectures.<n>We preserve the integrity of each architecture and combine them using ensemble techniques.<n>A direct outcome of this work is the creation of an ensemble of classification networks that surpasses the accuracy of the previous state-of-the-art single classification network on ImageNet.
Score: 2.907712261410302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, Convolutional Neural Networks (CNNs), MLP-mixers, and Vision Transformers have risen to prominence as leading neural architectures in image classification. Prior research has underscored the distinct advantages of each architecture, and there is growing evidence that combining modules from different architectures can boost performance. In this study, we build upon and improve previous work exploring the complementarity between different architectures. Instead of heuristically merging modules from various architectures through trial and error, we preserve the integrity of each architecture and combine them using ensemble techniques. By maintaining the distinctiveness of each architecture, we aim to explore their inherent complementarity more deeply and with implicit isolation. This approach provides a more systematic understanding of their individual strengths. In addition to uncovering insights into architectural complementarity, we showcase the effectiveness of even basic ensemble methods that combine models from diverse architectures. These methods outperform ensembles comprised of similar architectures. Our straightforward ensemble framework serves as a foundational strategy for blending complementary architectures, offering a solid starting point for further investigations into the unique strengths and synergies among different architectures and their ensembles in image classification. A direct outcome of this work is the creation of an ensemble of classification networks that surpasses the accuracy of the previous state-of-the-art single classification network on ImageNet, setting a new benchmark, all while requiring less overall latency.

Related papers

EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [20.209756662832365]
Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency.<n>We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.<n>We show that EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.
arXiv Detail & Related papers (2024-09-22T13:11:08Z)
Enhancing Representations through Heterogeneous Self-Supervised Learning [61.40674648939691]
We propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. The HSSL endows the base model with new characteristics in a representation learning way without structural changes. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks.
arXiv Detail & Related papers (2023-10-08T10:44:05Z)
Ultra Sharp : Study of Single Image Super Resolution using Residual Dense Network [0.15229257192293202]
Single Image Super Resolution (SISR) has been an interesting and ill-posed problem in computer vision. Traditional super-resolution imaging approaches involve, reconstruction, and learning-based methods. This study examines the Residual Dense Networks architecture proposed by Yhang et al.
arXiv Detail & Related papers (2023-04-21T10:32:24Z)
FedHeN: Federated Learning in Heterogeneous Networks [52.29110497518558]
We propose a novel training recipe for federated learning with heterogeneous networks. We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting.
arXiv Detail & Related papers (2022-07-07T01:08:35Z)
Learning Interpretable Models Through Multi-Objective Neural Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z)
A Multisensory Learning Architecture for Rotation-invariant Object Recognition [0.0]
This study presents a multisensory machine learning architecture for object recognition by employing a novel dataset that was constructed with the iCub robot. The proposed architecture combines convolutional neural networks to form representations (i.e., features) ford grayscale color images and a multi-layer perceptron algorithm to process depth data.
arXiv Detail & Related papers (2020-09-14T09:39:48Z)
NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior. We propose to search for neural architectures that capture stronger image priors. We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z)
Neural Ensemble Search for Uncertainty Estimation and Dataset Shift [67.57720300323928]
Ensembles of neural networks achieve superior performance compared to stand-alone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift. We propose two methods for automatically constructing ensembles with emphvarying architectures. We show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift.
arXiv Detail & Related papers (2020-06-15T17:38:15Z)
Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? [22.63641173256389]
Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias. We observe that the structural properties of neural architectures are hard to preserve in the latent space if architecture representation learning and search are coupled, resulting in less effective search performance.
arXiv Detail & Related papers (2020-06-12T04:15:34Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These networks consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
Residual Attention Net for Superior Cross-Domain Time Sequence Modeling [0.0]
This paper serves as a proof-of-concept for a new architecture, with RAN aiming at providing the model a higher level understanding of sequence patterns. We have achieved 35 state-of-the-art results with 10 results matching current state-of-the-art results without further model fine-tuning.
arXiv Detail & Related papers (2020-01-13T06:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.