TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment
- URL: http://arxiv.org/abs/2405.12353v1
- Date: Mon, 20 May 2024 20:03:51 GMT
- Title: TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment
- Authors: Hasib-Al Rashid, Tinoosh Mohsenin,
- Abstract summary: This work introduces TinyM$2$Net-V3, a system that processes different modalities of complementary data, designs deep neural network (DNN) models, and employs model compression techniques.
Our tiny machine learning models, deployed on resource limited hardware, demonstrated low latencies within milliseconds and very high power efficiency.
- Score: 0.5893124686141782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of sophisticated artificial intelligence (AI) algorithms has led to a notable increase in energy usage and carbon dioxide emissions, intensifying concerns about climate change. This growing problem has brought the environmental sustainability of AI technologies to the forefront, especially as they expand across various sectors. In response to these challenges, there is an urgent need for the development of sustainable AI solutions. These solutions must focus on energy-efficient embedded systems that are capable of handling diverse data types even in environments with limited resources, thereby ensuring both technological progress and environmental responsibility. Integrating complementary multimodal data into tiny machine learning models for edge devices is challenging due to increased complexity, latency, and power consumption. This work introduces TinyM$^2$Net-V3, a system that processes different modalities of complementary data, designs deep neural network (DNN) models, and employs model compression techniques including knowledge distillation and low bit-width quantization with memory-aware considerations to fit models within lower memory hierarchy levels, reducing latency and enhancing energy efficiency on resource-constrained devices. We evaluated TinyM$^2$Net-V3 in two multimodal case studies: COVID-19 detection using cough, speech, and breathing audios, and pose classification from depth and thermal images. With tiny inference models (6 KB and 58 KB), we achieved 92.95% and 90.7% accuracies, respectively. Our tiny machine learning models, deployed on resource limited hardware, demonstrated low latencies within milliseconds and very high power efficiency.
Related papers
- Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs [96.68469559192846]
We present two differently sized MoE large language models (LLMs)
Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters.
We propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency.
arXiv Detail & Related papers (2025-03-07T04:43:39Z) - Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.<n>Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.<n>We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - On Accelerating Edge AI: Optimizing Resource-Constrained Environments [1.7355861031903428]
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations.
We present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints.
arXiv Detail & Related papers (2025-01-25T01:37:03Z) - Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology.
This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit.
We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models.
We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z) - Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy.
We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations.
We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z) - EVE: Environmental Adaptive Neural Network Models for Low-power Energy
Harvesting System [8.16411986220709]
Energy harvesting technology that harvests energy from ambient environment is a promising alternative to batteries for powering those devices.
This paper proposes EVE, an automated machine learning framework to search for desired multi-models with shared weights for energy harvesting IoT devices.
Experimental results show that the neural networks models generated by EVE is on average 2.5X faster than the baseline models without pruning and shared weights.
arXiv Detail & Related papers (2022-07-14T20:53:46Z) - Energy-efficient Deployment of Deep Learning Applications on Cortex-M
based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers.
We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies.
We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z) - Pervasive Machine Learning for Smart Radio Environments Enabled by
Reconfigurable Intelligent Surfaces [56.35676570414731]
The emerging technology of Reconfigurable Intelligent Surfaces (RISs) is provisioned as an enabler of smart wireless environments.
RISs offer a highly scalable, low-cost, hardware-efficient, and almost energy-neutral solution for dynamic control of the propagation of electromagnetic signals over the wireless medium.
One of the major challenges with the envisioned dense deployment of RISs in such reconfigurable radio environments is the efficient configuration of multiple metasurfaces.
arXiv Detail & Related papers (2022-05-08T06:21:33Z) - YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z) - Prune2Edge: A Multi-Phase Pruning Pipelines to Deep Ensemble Learning in
IIoT [0.0]
We propose a novel edge-based multi-phase pruning pipelines to ensemble learning on IIoT devices.
In the first phase, we generate a diverse ensemble of pruned models, then we apply integer quantisation, next we prune the generated ensemble using a clustering-based technique.
Our proposed approach was able to outperform the predictability levels of a baseline model.
arXiv Detail & Related papers (2020-04-09T17:44:34Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.