Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
Resource-constrained Devices
- URL: http://arxiv.org/abs/2309.06612v2
- Date: Thu, 28 Sep 2023 15:44:54 GMT
- Title: Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
Resource-constrained Devices
- Authors: Mohamed Imed Eddine Ghebriout, Halima Bouzidi, Smail Niar, Hamza
Ouarnoughi
- Abstract summary: We propose a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices.
Harmonic-NAS achieves 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
- Score: 0.4915744683251151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent surge of interest surrounding Multimodal Neural Networks (MM-NN)
is attributed to their ability to effectively process and integrate multiscale
information from diverse data sources. MM-NNs extract and fuse features from
multiple modalities using adequate unimodal backbones and specific fusion
networks. Although this helps strengthen the multimodal information
representation, designing such networks is labor-intensive. It requires tuning
the architectural parameters of the unimodal backbones, choosing the fusing
point, and selecting the operations for fusion. Furthermore, multimodality AI
is emerging as a cutting-edge option in Internet of Things (IoT) systems where
inference latency and energy consumption are critical metrics in addition to
accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint
optimization of unimodal backbones and multimodal fusion networks with hardware
awareness on resource-constrained devices. Harmonic-NAS involves a two-tier
optimization approach for the unimodal backbone architectures and fusion
strategy and operators. By incorporating the hardware dimension into the
optimization, evaluation results on various devices and multimodal datasets
have demonstrated the superiority of Harmonic-NAS over state-of-the-art
approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction,
and 2.14x energy efficiency gain.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data.
Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy.
This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Resource-Efficient Sensor Fusion via System-Wide Dynamic Gated Neural Networks [16.0018681576301]
We propose a novel algorithmic strategy called Quantile-constrained Inference (QIC)
QIC makes joint, high-quality, swift decisions on all the above aspects of the system.
Our results confirm that QIC matches the optimum and outperforms its alternatives by over 80%.
arXiv Detail & Related papers (2024-10-22T06:12:04Z) - Resource Efficient Asynchronous Federated Learning for Digital Twin Empowered IoT Network [29.895766751146155]
Digital twin (DT) can provide real-time status and dynamic topology mapping for Internet of Things (IoT) devices.
We develop a dynamic resource scheduling algorithm tailored for the asynchronous federated learning (FL)-based lightweight DT empowered IoT network.
Specifically, our approach aims to minimize a multi-objective function that encompasses both energy consumption and latency.
arXiv Detail & Related papers (2024-08-26T14:28:51Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.
We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.
We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z) - Dynamic Multimodal Fusion [8.530680502975095]
Dynamic multimodal fusion (DynMM) is a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference.
Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach.
arXiv Detail & Related papers (2022-03-31T21:35:13Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - BM-NAS: Bilevel Multimodal Neural Architecture Search [30.472605201814428]
This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework.
It makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme.
BM-NAS achieves competitive performances with much less search time and fewer model parameters.
arXiv Detail & Related papers (2021-04-19T15:09:49Z) - Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment
Analysis [12.386788662621338]
Multimodal sentiment analysis utilizes multiple heterogeneous modalities for sentiment classification.
Recent multimodal fusion schemes customize LSTMs to discover intra-modal dynamics.
We propose a common network to discover both intra-modal and inter-modal dynamics.
arXiv Detail & Related papers (2020-10-16T08:02:11Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.