Related papers: Mobile Foundation Model as Firmware

Mobile Foundation Model as Firmware

URL: http://arxiv.org/abs/2308.14363v3
Date: Tue, 12 Mar 2024 02:17:03 GMT
Title: Mobile Foundation Model as Firmware
Authors: Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, Zeling Zhang, Xiang Li, Dingge Zhang, Hanzi Mei, Xianqing Jia, Shangguang Wang, Mengwei Xu
Abstract summary: sys is a collaborative management approach between the mobile OS and hardware. It amalgamates a curated selection of publicly available Large Language Models (LLMs) and facilitates dynamic data flow. It attains accuracy parity in 85% of tasks, demonstrates improved scalability in terms of storage and memory, and offers satisfactory inference speed.
Score: 13.225478051091763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In today's landscape, smartphones have evolved into hubs for hosting a multitude of deep learning models aimed at local execution. A key realization driving this work is the notable fragmentation among these models, characterized by varied architectures, operators, and implementations. This fragmentation imposes a significant burden on the comprehensive optimization of hardware, system settings, and algorithms. Buoyed by the recent strides in large foundation models, this work introduces a pioneering paradigm for mobile AI: a collaborative management approach between the mobile OS and hardware, overseeing a foundational model capable of serving a broad spectrum of mobile AI tasks, if not all. This foundational model resides within the NPU and remains impervious to app or OS revisions, akin to firmware. Concurrently, each app contributes a concise, offline fine-tuned "adapter" tailored to distinct downstream tasks. From this concept emerges a concrete instantiation known as \sys. It amalgamates a curated selection of publicly available Large Language Models (LLMs) and facilitates dynamic data flow. This concept's viability is substantiated through the creation of an exhaustive benchmark encompassing 38 mobile AI tasks spanning 50 datasets, including domains such as Computer Vision (CV), Natural Language Processing (NLP), audio, sensing, and multimodal inputs. Spanning this benchmark, \sys unveils its impressive performance. It attains accuracy parity in 85\% of tasks, demonstrates improved scalability in terms of storage and memory, and offers satisfactory inference speed on Commercial Off-The-Shelf (COTS) mobile devices fortified with NPU support. This stands in stark contrast to task-specific models tailored for individual applications.

Related papers

An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas. We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z)
Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism. It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z)
ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships. Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands. We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [59.193626019860226]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers. We show that CAS-ViT achieves a competitive performance when compared to other state-of-the-art backbones.
arXiv Detail & Related papers (2024-08-07T11:33:46Z)
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z)
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters. To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z)
Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation Models: A Multi-Agent Deep Reinforcement Learning Approach [10.47302625959368]
We present a groundbreaking paradigm integrating Mobile Edge Computing with foundation models, specifically designed to enhance local task performance on user equipment (UE) Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules. We introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings.
arXiv Detail & Related papers (2023-10-26T15:47:51Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition [29.522565659389183]
We introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. We beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles.
arXiv Detail & Related papers (2023-01-30T18:34:16Z)
Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs [10.680700357879601]
Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by on-device ML accelerators. Existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. We provide a two-pronged approach to this challenge: (i) a neural architecture that decouples model cost evaluation, search space design, and the algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants.
arXiv Detail & Related papers (2022-04-09T00:35:19Z)
Smart at what cost? Characterising Mobile Deep Neural Networks in the wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild. We analyse over 16k of the most popular apps in the Google Play Store. We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z)
Real-time Monocular Depth Estimation with Sparse Supervision on Mobile [2.5425323889482336]
In recent years, with the increasing availability of mobile devices, accurate and mobile-friendly depth models have gained importance. We show, with key design choices and studies, even existing architecture can reach highly competitive performance. A version of our model achieves 0.1208 W on DIW with 1M parameters and reaches 44 FPS on a mobile GPU.
arXiv Detail & Related papers (2021-05-25T16:33:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.