Mobile Foundation Model as Firmware
- URL: http://arxiv.org/abs/2308.14363v3
- Date: Tue, 12 Mar 2024 02:17:03 GMT
- Title: Mobile Foundation Model as Firmware
- Authors: Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, Zeling
Zhang, Xiang Li, Dingge Zhang, Hanzi Mei, Xianqing Jia, Shangguang Wang,
Mengwei Xu
- Abstract summary: sys is a collaborative management approach between the mobile OS and hardware.
It amalgamates a curated selection of publicly available Large Language Models (LLMs) and facilitates dynamic data flow.
It attains accuracy parity in 85% of tasks, demonstrates improved scalability in terms of storage and memory, and offers satisfactory inference speed.
- Score: 13.225478051091763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In today's landscape, smartphones have evolved into hubs for hosting a
multitude of deep learning models aimed at local execution. A key realization
driving this work is the notable fragmentation among these models,
characterized by varied architectures, operators, and implementations. This
fragmentation imposes a significant burden on the comprehensive optimization of
hardware, system settings, and algorithms.
Buoyed by the recent strides in large foundation models, this work introduces
a pioneering paradigm for mobile AI: a collaborative management approach
between the mobile OS and hardware, overseeing a foundational model capable of
serving a broad spectrum of mobile AI tasks, if not all. This foundational
model resides within the NPU and remains impervious to app or OS revisions,
akin to firmware. Concurrently, each app contributes a concise, offline
fine-tuned "adapter" tailored to distinct downstream tasks. From this concept
emerges a concrete instantiation known as \sys. It amalgamates a curated
selection of publicly available Large Language Models (LLMs) and facilitates
dynamic data flow. This concept's viability is substantiated through the
creation of an exhaustive benchmark encompassing 38 mobile AI tasks spanning 50
datasets, including domains such as Computer Vision (CV), Natural Language
Processing (NLP), audio, sensing, and multimodal inputs. Spanning this
benchmark, \sys unveils its impressive performance. It attains accuracy parity
in 85\% of tasks, demonstrates improved scalability in terms of storage and
memory, and offers satisfactory inference speed on Commercial Off-The-Shelf
(COTS) mobile devices fortified with NPU support. This stands in stark contrast
to task-specific models tailored for individual applications.
Related papers
- Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism.
It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features.
The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability.
We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers, to achieve a balance between efficiency and performance in mobile applications.
Our model achieves 83.0%/84.1% top-1 with only 12M/21M parameters on ImageNet-1K.
arXiv Detail & Related papers (2024-08-07T11:33:46Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation
Models: A Multi-Agent Deep Reinforcement Learning Approach [10.47302625959368]
We present a groundbreaking paradigm integrating Mobile Edge Computing with foundation models, specifically designed to enhance local task performance on user equipment (UE)
Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules.
We introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings.
arXiv Detail & Related papers (2023-10-26T15:47:51Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition [29.522565659389183]
We introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition.
We beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles.
arXiv Detail & Related papers (2023-01-30T18:34:16Z) - Smart at what cost? Characterising Mobile Deep Neural Networks in the
wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild.
We analyse over 16k of the most popular apps in the Google Play Store.
We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z) - Real-time Monocular Depth Estimation with Sparse Supervision on Mobile [2.5425323889482336]
In recent years, with the increasing availability of mobile devices, accurate and mobile-friendly depth models have gained importance.
We show, with key design choices and studies, even existing architecture can reach highly competitive performance.
A version of our model achieves 0.1208 W on DIW with 1M parameters and reaches 44 FPS on a mobile GPU.
arXiv Detail & Related papers (2021-05-25T16:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.