Cost-Driven Hardware-Software Co-Optimization of Machine Learning
Pipelines
- URL: http://arxiv.org/abs/2310.07940v2
- Date: Thu, 19 Oct 2023 16:46:23 GMT
- Title: Cost-Driven Hardware-Software Co-Optimization of Machine Learning
Pipelines
- Authors: Ravit Sharma, Wojciech Romaszkan, Feiqian Zhu, Puneet Gupta, Ankur
Mehta
- Abstract summary: Deep neural networks are increasingly being used to embed intelligence in smart devices.
Their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms.
We holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors.
- Score: 5.3477186309338505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers have long touted a vision of the future enabled by a
proliferation of internet-of-things devices, including smart sensors, homes,
and cities. Increasingly, embedding intelligence in such devices involves the
use of deep neural networks. However, their storage and processing requirements
make them prohibitive for cheap, off-the-shelf platforms. Overcoming those
requirements is necessary for enabling widely-applicable smart devices. While
many ways of making models smaller and more efficient have been developed,
there is a lack of understanding of which ones are best suited for particular
scenarios. More importantly for edge platforms, those choices cannot be
analyzed in isolation from cost and user experience. In this work, we
holistically explore how quantization, model scaling, and multi-modality
interact with system components such as memory, sensors, and processors. We
perform this hardware/software co-design from the cost, latency, and
user-experience perspective, and develop a set of guidelines for optimal system
design and model deployment for the most cost-constrained platforms. We
demonstrate our approach using an end-to-end, on-device, biometric user
authentication system using a $20 ESP-EYE board.
Related papers
- iOS as Acceleration [51.56484100374058]
We present a proof-of-concept system demonstrating a novel approach to harness an iOS device via distributed pipeline parallelism.<n>The findings of this paper highlight the potential for the improving commonplace mobile devices to provide greater contributions to machine learning.
arXiv Detail & Related papers (2025-12-19T13:30:44Z) - Detecting Anomalies in Machine Learning Infrastructure via Hardware Telemetry [6.238074548326156]
workload knowledge is unnecessary for system-level optimization.<n>We propose Reveal, which takes a hardware-centric approach, relying only on hardware signals.<n>We successfully identified both network and system configuration issues, accelerating the DeepSeek model by 5.97%.
arXiv Detail & Related papers (2025-10-29T22:39:09Z) - Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions [0.15039745292757667]
Edge intelligent applications like VR/AR and language model based chatbots have become widespread with the rapid expansion of IoT and mobile devices.<n> constrained edge devices often cannot serve the increasingly large and complex deep learning (DL) models.<n>To mitigate these challenges, researchers have proposed optimizing and offloading partitions of DL models among user devices, edge servers, and the cloud.
arXiv Detail & Related papers (2025-10-27T01:26:52Z) - Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model [50.37090759139591]
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters.
The human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption.
We are releasing a software toolkit named DarwinKit (Darkit) to accelerate the adoption of brain-inspired large language models.
arXiv Detail & Related papers (2024-12-20T07:50:08Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - The Future of Consumer Edge-AI Computing [58.445652425379855]
Deep Learning has rapidly infiltrated the consumer end, mainly thanks to hardware acceleration across devices.
As we look towards the future, it is evident that isolated hardware will be insufficient.
We introduce a novel paradigm centered around EdgeAI-Hub devices, designed to reorganise and optimise compute resources and data access at the consumer edge.
arXiv Detail & Related papers (2022-10-19T12:41:47Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - Multi-Component Optimization and Efficient Deployment of Neural-Networks
on Resource-Constrained IoT Hardware [4.6095200019189475]
We present an end-to-end multi-component model optimization sequence and open-source its implementation.
Our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms.
arXiv Detail & Related papers (2022-04-20T13:30:04Z) - Smart at what cost? Characterising Mobile Deep Neural Networks in the
wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild.
We analyse over 16k of the most popular apps in the Google Play Store.
We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - SensiX: A Platform for Collaborative Machine Learning on the Edge [69.1412199244903]
We present SensiX, a personal edge platform that stays between sensor data and sensing models.
We demonstrate its efficacy in developing motion and audio-based multi-device sensing systems.
Our evaluation shows that SensiX offers a 7-13% increase in overall accuracy and up to 30% increase across different environment dynamics at the expense of 3mW power overhead.
arXiv Detail & Related papers (2020-12-04T23:06:56Z) - Learned Hardware/Software Co-Design of Neural Accelerators [20.929918108940093]
Deep learning software stacks and hardware accelerators are diverse and vast.
Prior work considers software optimizations separately from hardware architectures, effectively reducing the search space.
This paper casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space.
arXiv Detail & Related papers (2020-10-05T15:12:52Z) - Making DensePose fast and light [78.49552144907513]
Existing neural network models capable of solving this task are heavily parameterized.
To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection.
In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
arXiv Detail & Related papers (2020-06-26T19:42:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.