Mobile Foundation Model as Firmware
- URL: http://arxiv.org/abs/2308.14363v3
- Date: Tue, 12 Mar 2024 02:17:03 GMT
- Title: Mobile Foundation Model as Firmware
- Authors: Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, Zeling
Zhang, Xiang Li, Dingge Zhang, Hanzi Mei, Xianqing Jia, Shangguang Wang,
Mengwei Xu
- Abstract summary: sys is a collaborative management approach between the mobile OS and hardware.
It amalgamates a curated selection of publicly available Large Language Models (LLMs) and facilitates dynamic data flow.
It attains accuracy parity in 85% of tasks, demonstrates improved scalability in terms of storage and memory, and offers satisfactory inference speed.
- Score: 13.225478051091763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In today's landscape, smartphones have evolved into hubs for hosting a
multitude of deep learning models aimed at local execution. A key realization
driving this work is the notable fragmentation among these models,
characterized by varied architectures, operators, and implementations. This
fragmentation imposes a significant burden on the comprehensive optimization of
hardware, system settings, and algorithms.
Buoyed by the recent strides in large foundation models, this work introduces
a pioneering paradigm for mobile AI: a collaborative management approach
between the mobile OS and hardware, overseeing a foundational model capable of
serving a broad spectrum of mobile AI tasks, if not all. This foundational
model resides within the NPU and remains impervious to app or OS revisions,
akin to firmware. Concurrently, each app contributes a concise, offline
fine-tuned "adapter" tailored to distinct downstream tasks. From this concept
emerges a concrete instantiation known as \sys. It amalgamates a curated
selection of publicly available Large Language Models (LLMs) and facilitates
dynamic data flow. This concept's viability is substantiated through the
creation of an exhaustive benchmark encompassing 38 mobile AI tasks spanning 50
datasets, including domains such as Computer Vision (CV), Natural Language
Processing (NLP), audio, sensing, and multimodal inputs. Spanning this
benchmark, \sys unveils its impressive performance. It attains accuracy parity
in 85\% of tasks, demonstrates improved scalability in terms of storage and
memory, and offers satisfactory inference speed on Commercial Off-The-Shelf
(COTS) mobile devices fortified with NPU support. This stands in stark contrast
to task-specific models tailored for individual applications.
Related papers
- MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - General Object Foundation Model for Images and Videos at Scale [99.2806103051613]
We present GLEE, an object-level foundation model for locating and identifying objects in images and videos.
GLEE accomplishes detection, segmentation, tracking, grounding, and identification of arbitrary objects in the open world scenario.
We employ an image encoder, text encoder, and visual prompter to handle multi-modal inputs, enabling to simultaneously solve various object-centric downstream tasks.
arXiv Detail & Related papers (2023-12-14T17:26:00Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation
Models: A Multi-Agent Deep Reinforcement Learning Approach [10.47302625959368]
We present a groundbreaking paradigm integrating Mobile Edge Computing with foundation models, specifically designed to enhance local task performance on user equipment (UE)
Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules.
We introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings.
arXiv Detail & Related papers (2023-10-26T15:47:51Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition [29.522565659389183]
We introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition.
We beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles.
arXiv Detail & Related papers (2023-01-30T18:34:16Z) - Searching for Efficient Neural Architectures for On-Device ML on Edge
TPUs [10.680700357879601]
Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by on-device ML accelerators.
Existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms.
We provide a two-pronged approach to this challenge: (i) a neural architecture that decouples model cost evaluation, search space design, and the algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants.
arXiv Detail & Related papers (2022-04-09T00:35:19Z) - Smart at what cost? Characterising Mobile Deep Neural Networks in the
wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild.
We analyse over 16k of the most popular apps in the Google Play Store.
We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z) - Real-time Monocular Depth Estimation with Sparse Supervision on Mobile [2.5425323889482336]
In recent years, with the increasing availability of mobile devices, accurate and mobile-friendly depth models have gained importance.
We show, with key design choices and studies, even existing architecture can reach highly competitive performance.
A version of our model achieves 0.1208 W on DIW with 1M parameters and reaches 44 FPS on a mobile GPU.
arXiv Detail & Related papers (2021-05-25T16:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.