Related papers: LightAgent: Mobile Agentic Foundation Models

LightAgent: Mobile Agentic Foundation Models

URL: http://arxiv.org/abs/2510.22009v1
Date: Fri, 24 Oct 2025 20:23:12 GMT
Title: LightAgent: Mobile Agentic Foundation Models
Authors: Yangqin Jiang, Chao Huang,
Abstract summary: We propose a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models.<n>Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making.<n>Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.
Score: 8.847692192802343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.

Related papers

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device [90.46496321553843]
We present Mobile-O, a compact vision-language-diffusion model that brings unified multimodal intelligence to a mobile device.<n>Its core module, the Mobile Conditioning Projector (MCP), fuses vision-language features with a diffusion generator using depthwise-separable convolutions and layerwise alignment.<n>Running in only 3s per 512x512 image on an iPhone, Mobile-O establishes the first practical framework for real-time unified multimodal understanding and generation on edge devices.
arXiv Detail & Related papers (2026-02-23T18:59:58Z)
Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control [72.43808515668947]
We introduce Hi-Agent, a trainable hierarchical vision-language agent for mobile control.<n>Hi-Agent features a high-level reasoning model and a low-level action model that are jointly optimized.<n>Hi-Agent achieves a new State-Of-The-Art (SOTA) 87.9% task success rate on the Android-in-the-Wild (AitW) benchmark.
arXiv Detail & Related papers (2025-10-16T07:38:21Z)
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model [43.13807038270687]
CDCDA-PLM is a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM.<n>Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules.
arXiv Detail & Related papers (2025-08-29T02:33:13Z)
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits [11.398891065175686]
Large Language Models (LLMs) enable various applications on edge devices such as smartphones, wearables, and embodied robots.<n>LLMs can be deployed on-device, offering a cost-effective solution with reduced latency and improved privacy.<n>We propose a fast and cost-effective speculative edge-cloud decoding framework with a large target model on the server and a small draft model on the device.
arXiv Detail & Related papers (2025-05-27T14:55:16Z)
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation [36.08217588070538]
Cloud-based mobile agents powered by (multimodal) large language models ((M)LLMs) offer strong reasoning abilities but suffer from high latency and cost.<n>We propose textbfEcoAgent, an textbfEdge-textbfCloud ctextbfOllaborative multi-agent framework for mobile automation.<n>EcoAgent features a closed-loop collaboration among a cloud-based Planning Agent and two edge-based agents: the Execution Agent for action execution and the Observation Agent for verifying outcomes.
arXiv Detail & Related papers (2025-05-08T17:31:20Z)
Toward Super Agent System with Hybrid AI Routers [19.22599167969104]
Super agents can fulfill diverse user needs, such as summarization, coding, and research.<n>This paper presents a design of the Super Agent System powered by the hybrid AI routers.<n>With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed.
arXiv Detail & Related papers (2025-04-11T00:54:56Z)
Liquid: Language Models are Scalable and Unified Multi-modal Generators [112.71734051183726]
Liquid is an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation.<n>Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model.<n>For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks.
arXiv Detail & Related papers (2024-12-05T16:48:16Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)
Cloud-Device Collaborative Learning for Multimodal Large Language Models [24.65882336700547]
We introduce a Cloud-Device Collaborative Continual Adaptation framework to enhance the performance of compressed, device-deployed MLLMs. Our framework is structured into three key components: a device-to-cloud uplink for efficient data transmission, cloud-based knowledge adaptation, and an optimized cloud-to-device downlink for model deployment.
arXiv Detail & Related papers (2023-12-26T18:46:14Z)
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world [20.547119604004774]
We propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device. We also propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model. Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods.
arXiv Detail & Related papers (2022-12-02T05:02:36Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications.<n>We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Device-Cloud Collaborative Learning for Recommendation [50.01289274123047]
We propose a novel MetaPatch learning approach on the device side to efficiently achieve "thousands of people with thousands of models" given a centralized cloud model. With billions of updated personalized device models, we propose a "model-over-models" distillation algorithm, namely MoMoDistill, to update the centralized cloud model.
arXiv Detail & Related papers (2021-04-14T05:06:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.