LightAgent: Mobile Agentic Foundation Models
- URL: http://arxiv.org/abs/2510.22009v1
- Date: Fri, 24 Oct 2025 20:23:12 GMT
- Title: LightAgent: Mobile Agentic Foundation Models
- Authors: Yangqin Jiang, Chao Huang,
- Abstract summary: We propose a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models.<n>Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making.<n>Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.
- Score: 8.847692192802343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.
Related papers
- Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device [90.46496321553843]
We present Mobile-O, a compact vision-language-diffusion model that brings unified multimodal intelligence to a mobile device.<n>Its core module, the Mobile Conditioning Projector (MCP), fuses vision-language features with a diffusion generator using depthwise-separable convolutions and layerwise alignment.<n>Running in only 3s per 512x512 image on an iPhone, Mobile-O establishes the first practical framework for real-time unified multimodal understanding and generation on edge devices.
arXiv Detail & Related papers (2026-02-23T18:59:58Z) - Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control [72.43808515668947]
We introduce Hi-Agent, a trainable hierarchical vision-language agent for mobile control.<n>Hi-Agent features a high-level reasoning model and a low-level action model that are jointly optimized.<n>Hi-Agent achieves a new State-Of-The-Art (SOTA) 87.9% task success rate on the Android-in-the-Wild (AitW) benchmark.
arXiv Detail & Related papers (2025-10-16T07:38:21Z) - Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model [43.13807038270687]
CDCDA-PLM is a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM.<n>Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules.
arXiv Detail & Related papers (2025-08-29T02:33:13Z) - Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits [11.398891065175686]
Large Language Models (LLMs) enable various applications on edge devices such as smartphones, wearables, and embodied robots.<n>LLMs can be deployed on-device, offering a cost-effective solution with reduced latency and improved privacy.<n>We propose a fast and cost-effective speculative edge-cloud decoding framework with a large target model on the server and a small draft model on the device.
arXiv Detail & Related papers (2025-05-27T14:55:16Z) - EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation [36.08217588070538]
Cloud-based mobile agents powered by (multimodal) large language models ((M)LLMs) offer strong reasoning abilities but suffer from high latency and cost.<n>We propose textbfEcoAgent, an textbfEdge-textbfCloud ctextbfOllaborative multi-agent framework for mobile automation.<n>EcoAgent features a closed-loop collaboration among a cloud-based Planning Agent and two edge-based agents: the Execution Agent for action execution and the Observation Agent for verifying outcomes.
arXiv Detail & Related papers (2025-05-08T17:31:20Z) - Toward Super Agent System with Hybrid AI Routers [19.22599167969104]
Super agents can fulfill diverse user needs, such as summarization, coding, and research.<n>This paper presents a design of the Super Agent System powered by the hybrid AI routers.<n>With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed.
arXiv Detail & Related papers (2025-04-11T00:54:56Z) - Liquid: Language Models are Scalable and Unified Multi-modal Generators [112.71734051183726]
Liquid is an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation.<n>Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model.<n>For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks.
arXiv Detail & Related papers (2024-12-05T16:48:16Z) - xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks.
xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z) - Cloud-Device Collaborative Learning for Multimodal Large Language Models [24.65882336700547]
We introduce a Cloud-Device Collaborative Continual Adaptation framework to enhance the performance of compressed, device-deployed MLLMs.
Our framework is structured into three key components: a device-to-cloud uplink for efficient data transmission, cloud-based knowledge adaptation, and an optimized cloud-to-device downlink for model deployment.
arXiv Detail & Related papers (2023-12-26T18:46:14Z) - Cloud-Device Collaborative Adaptation to Continual Changing Environments
in the Real-world [20.547119604004774]
We propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device.
We also propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model.
Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods.
arXiv Detail & Related papers (2022-12-02T05:02:36Z) - DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications.<n>We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z) - Device-Cloud Collaborative Learning for Recommendation [50.01289274123047]
We propose a novel MetaPatch learning approach on the device side to efficiently achieve "thousands of people with thousands of models" given a centralized cloud model.
With billions of updated personalized device models, we propose a "model-over-models" distillation algorithm, namely MoMoDistill, to update the centralized cloud model.
arXiv Detail & Related papers (2021-04-14T05:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.