Related papers: Adaptive Integrated Layered Attention (AILA)

Adaptive Integrated Layered Attention (AILA)

URL: http://arxiv.org/abs/2503.22742v1
Date: Wed, 26 Mar 2025 19:32:31 GMT
Title: Adaptive Integrated Layered Attention (AILA)
Authors: William Claster, Suhas KM, Dhairya Gundechia,
Abstract summary: We propose Adaptive Layered Integrated Attention (AILA), a neural network architecture that combines dense skip connections with different mechanisms for adaptive feature reuse across network layers.<n>We evaluate AILA on three challenging tasks: price forecasting for various commodities and indices, image recognition using the CIFAR-10 dataset, and sentiment analysis on the IMDB movie review dataset.<n>Results confirm that AILA's adaptive inter-layer connections yield robust gains by flexibly reusing pertinent features at multiple network depths.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose Adaptive Integrated Layered Attention (AILA), a neural network architecture that combines dense skip connections with different mechanisms for adaptive feature reuse across network layers. We evaluate AILA on three challenging tasks: price forecasting for various commodities and indices (S&P 500, Gold, US dollar Futures, Coffee, Wheat), image recognition using the CIFAR-10 dataset, and sentiment analysis on the IMDB movie review dataset. In all cases, AILA matches strong deep learning baselines (LSTMs, Transformers, and ResNets), achieving it at a fraction of the training and inference time. Notably, we implement and test two versions of the model - AILA-Architecture 1, which uses simple linear layers as the connection mechanism between layers, and AILA-Architecture 2, which implements an attention mechanism to selectively focus on outputs from previous layers. Both architectures are applied in a single-task learning setting, with each model trained separately for individual tasks. Results confirm that AILA's adaptive inter-layer connections yield robust gains by flexibly reusing pertinent features at multiple network depths. The AILA approach thus presents an extension to existing architectures, improving long-range sequence modeling, image recognition with optimised computational speed, and SOTA classification performance in practice.

Related papers

IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification [12.935583315234553]
This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures.<n>IncepFormerNet adeptly extracts multi-scale temporal information from time series data using parallel convolution kernels of varying sizes.<n>It takes advantage of filter bank techniques to extract features based on the spectral characteristics of SSVEP data.
arXiv Detail & Related papers (2025-02-04T13:04:03Z)
EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [20.209756662832365]
Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency.<n>We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.<n>We show that EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.
arXiv Detail & Related papers (2024-09-22T13:11:08Z)
Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition [9.905155497581815]
We introduce automated network search (NAS) algorithms to the field of eye movement recognition. Relax DARTS is an improvement of the Differentiable Architecture Search (DARTS) to realize more efficient network search and training. Relax DARTS exhibits adaptability to other multi-feature temporal classification tasks.
arXiv Detail & Related papers (2024-09-18T02:37:04Z)
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction [0.0]
We propose AdaEnsemble: a Sparsely-Gated Mixture-of-Experts architecture that can leverage the strengths of heterogeneous feature interaction experts. AdaEnsemble can adaptively choose the feature interaction depth and find the corresponding SparseMoE stacking layer to exit and compute prediction from. We implement the proposed AdaEnsemble and evaluate its performance on real-world datasets.
arXiv Detail & Related papers (2023-01-06T12:08:15Z)
Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z)
Surrogate-assisted Multi-objective Neural Architecture Search for Real-time Semantic Segmentation [11.866947846619064]
neural architecture search (NAS) has emerged as a promising avenue toward automating the design of architectures. We propose a surrogate-assisted multi-objective method to address the challenges of applying NAS to semantic segmentation. Our method can identify architectures significantly outperforming existing state-of-the-art architectures designed both manually by human experts and automatically by other NAS methods.
arXiv Detail & Related papers (2022-08-14T10:18:51Z)
Neural Networks with A La Carte Selection of Activation Functions [0.0]
Activation functions (AFs) are pivotal to the success (or failure) of a neural network. We combine a slew of known AFs into successful architectures, proposing three methods to do so beneficially. We show that all methods often produce significantly better results for 25 classification problems when compared with a standard network composed of ReLU hidden units and a softmax output unit.
arXiv Detail & Related papers (2022-06-24T09:09:39Z)
Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z)
Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks. We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z)
Learning to Generate Content-Aware Dynamic Detectors [62.74209921174237]
We introduce a newpective of designing efficient detectors, which is automatically generating sample-adaptive model architecture. We introduce a course-to-fine strat-egy tailored for object detection to guide the learning of dynamic routing. Experiments on MS-COCO dataset demonstrate that CADDet achieves 1.8 higher mAP with 10% fewer FLOPs compared with vanilla routing.
arXiv Detail & Related papers (2020-12-08T08:05:20Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
AutoPose: Searching Multi-Scale Branch Aggregation for Pose Estimation [96.29533512606078]
We present AutoPose, a novel neural architecture search(NAS) framework. It is capable of automatically discovering multiple parallel branches of cross-scale connections towards accurate and high-resolution 2D human pose estimation.
arXiv Detail & Related papers (2020-08-16T22:27:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.