Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
Resource-constrained Devices
- URL: http://arxiv.org/abs/2309.06612v2
- Date: Thu, 28 Sep 2023 15:44:54 GMT
- Title: Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
Resource-constrained Devices
- Authors: Mohamed Imed Eddine Ghebriout, Halima Bouzidi, Smail Niar, Hamza
Ouarnoughi
- Abstract summary: We propose a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices.
Harmonic-NAS achieves 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
- Score: 0.4915744683251151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent surge of interest surrounding Multimodal Neural Networks (MM-NN)
is attributed to their ability to effectively process and integrate multiscale
information from diverse data sources. MM-NNs extract and fuse features from
multiple modalities using adequate unimodal backbones and specific fusion
networks. Although this helps strengthen the multimodal information
representation, designing such networks is labor-intensive. It requires tuning
the architectural parameters of the unimodal backbones, choosing the fusing
point, and selecting the operations for fusion. Furthermore, multimodality AI
is emerging as a cutting-edge option in Internet of Things (IoT) systems where
inference latency and energy consumption are critical metrics in addition to
accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint
optimization of unimodal backbones and multimodal fusion networks with hardware
awareness on resource-constrained devices. Harmonic-NAS involves a two-tier
optimization approach for the unimodal backbone architectures and fusion
strategy and operators. By incorporating the hardware dimension into the
optimization, evaluation results on various devices and multimodal datasets
have demonstrated the superiority of Harmonic-NAS over state-of-the-art
approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction,
and 2.14x energy efficiency gain.
Related papers
- Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation [7.797154022794006]
Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches.
We propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models.
Our method achieves state-of-the-art performance while reducing the model parameters by 60%.
arXiv Detail & Related papers (2024-07-16T03:19:59Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.
We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.
We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z) - Dynamic Multimodal Fusion [8.530680502975095]
Dynamic multimodal fusion (DynMM) is a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference.
Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach.
arXiv Detail & Related papers (2022-03-31T21:35:13Z) - Multi-modal land cover mapping of remote sensing images using pyramid
attention and gated fusion networks [20.66034058363032]
We propose a new multi-modality network for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU)
PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism.
GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise.
arXiv Detail & Related papers (2021-11-06T10:01:01Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - BM-NAS: Bilevel Multimodal Neural Architecture Search [30.472605201814428]
This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework.
It makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme.
BM-NAS achieves competitive performances with much less search time and fewer model parameters.
arXiv Detail & Related papers (2021-04-19T15:09:49Z) - Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment
Analysis [12.386788662621338]
Multimodal sentiment analysis utilizes multiple heterogeneous modalities for sentiment classification.
Recent multimodal fusion schemes customize LSTMs to discover intra-modal dynamics.
We propose a common network to discover both intra-modal and inter-modal dynamics.
arXiv Detail & Related papers (2020-10-16T08:02:11Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.