Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints
- URL: http://arxiv.org/abs/2602.00166v2
- Date: Thu, 05 Feb 2026 02:29:03 GMT
- Title: Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints
- Authors: Evan Chen, Wenzhi Fang, Shiqiang Wang, Christopher Brinton,
- Abstract summary: We propose DA-GRPO, a dual-advantage extension of Group Relative Policy Optimization.<n>It incorporates cloud-usage constraints directly into advantage computation, avoiding fixed reward shaping and external routing models.<n> Experiments on mathematical reasoning and code generation benchmarks show that DA-GRPO improves post-switch accuracy, substantially reduces forgetting, and maintains stable cloud usage.
- Score: 13.890405825812065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Locally deployed Small Language Models (SLMs) must continually support diverse tasks under strict memory and computation constraints, making selective reliance on cloud Large Language Models (LLMs) unavoidable. Regulating cloud assistance during continual learning is challenging, as naive reward-based reinforcement learning often yields unstable offloading behavior and exacerbates catastrophic forgetting as task distributions shift. We propose DA-GRPO, a dual-advantage extension of Group Relative Policy Optimization that incorporates cloud-usage constraints directly into advantage computation, avoiding fixed reward shaping and external routing models. This design enables the local model to jointly learn task competence and collaboration behavior, allowing cloud requests to emerge naturally during post-training while respecting a prescribed assistance budget. Experiments on mathematical reasoning and code generation benchmarks show that DA-GRPO improves post-switch accuracy, substantially reduces forgetting, and maintains stable cloud usage compared to prior collaborative and routing-based approaches.
Related papers
- Diffusion-Based Solver for CNF Placement on the Cloud-Continuum [1.529342790344802]
A novel theoretical framework is proposed, which is based on Denoising Diffusion Probabilistic Models (DDPM) for CNF placement.<n>The model incorporates constraint-specific losses directly into the loss function, thereby allowing it to learn feasible solution spaces.<n>The results obtained demonstrate the potential of diffusion-based generative modelling for constrained network embedding problems.
arXiv Detail & Related papers (2025-11-03T08:47:58Z) - HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness [49.72591739116668]
Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs)<n>However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training.<n>We propose HINT: Helping Ineffective rollouts Navigate Towards effectiveness, an adaptive hinting framework.
arXiv Detail & Related papers (2025-10-10T13:42:03Z) - Collaborative Device-Cloud LLM Inference through Reinforcement Learning [17.71514700623717]
Device-cloud collaboration has emerged as a promising paradigm for deploying large language models (LLMs)<n>We propose a framework where the on-device LLM makes routing decisions at the end of its solving process, with this capability instilled through post-training.<n>In particular, we formulate a reward problem with carefully designed rewards that encourage effective problem solving and judicious offloading to the cloud.
arXiv Detail & Related papers (2025-09-28T19:48:56Z) - Cloud-Device Collaborative Agents for Sequential Recommendation [36.05863003744828]
Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
arXiv Detail & Related papers (2025-09-01T15:28:11Z) - LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization [48.91511514636768]
Length-Adaptive Policy Optimization transforms reasoning length control from an external constraint into an intrinsic model capability.<n>LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process.<n> Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%.
arXiv Detail & Related papers (2025-07-21T16:14:41Z) - Edge-First Language Model Inference: Models, Metrics, and Tradeoffs [0.7980273012483663]
This work examines the interplay between edge and cloud deployments, starting from detailed benchmarking of SLM capabilities on single edge devices.<n>We identify scenarios where edge inference offers comparable performance with lower costs, and others where cloud fallback becomes essential due to limits in scalability or model capacity.<n>Rather than proposing a one-size-fits-all solution, we present platform-level comparisons and design insights for building efficient, adaptive LM inference systems.
arXiv Detail & Related papers (2025-05-22T10:43:00Z) - Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization [74.92515821144484]
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects.<n>Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive.<n>This paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models.
arXiv Detail & Related papers (2025-04-25T04:07:21Z) - Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC) is an algorithm for diffusion-based control with explicit state and action constraints.<n>We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints.
arXiv Detail & Related papers (2024-12-12T15:10:22Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Quantized Embedding Vectors for Controllable Diffusion Language Models [1.3287140837287783]
Quantized Embedding Controllable Diffusion Language Model improves controllability, portability, and inference speed of language models.
QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization.
arXiv Detail & Related papers (2024-02-15T17:02:48Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.