Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation
- URL: http://arxiv.org/abs/2404.19462v1
- Date: Tue, 30 Apr 2024 11:23:31 GMT
- Title: Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation
- Authors: Cengis Hasan, Alexandros Agapitos, David Lynch, Alberto Castagna, Giorgio Cruciata, Hao Wang, Aleksandar Milenovic,
- Abstract summary: We formulate throughput optimisation as Continual Reinforcement Learning of control policies.
Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold.
- Score: 73.04087903322237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain.
Related papers
- IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning [13.655904209137006]
We propose textbfImaginary Planning Distillation (IPD), a novel framework that seamlessly incorporates offline planning into data generation, supervised training, and online inference.<n>Our framework first learns a world model equipped with uncertainty measures and a quasi-optimal value function from the offline data.<n>By replacing the conventional, manually-tuned return-to-go with the learned quasi-optimal value function, IPD improves both decision-making stability and performance during inference.
arXiv Detail & Related papers (2026-03-04T17:05:39Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [86.99017195607077]
We address real-time sampling and estimation of autoregressive Markovian sources in wireless networks.<n>We propose a graphical reinforcement learning framework for policy optimization.<n>Theoretically, our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs.
arXiv Detail & Related papers (2026-01-19T02:18:45Z) - Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization [13.97375970293678]
DPO (Direct Preference Optimization) has become a widely used offline preference optimization algorithm due to its simplicity and training stability.<n>We propose Linear Preference Optimization (LPO), a novel alignment framework featuring three key innovations.<n>First, we introduce gradient decoupling by replacing the log-sigmoid function with an absolute difference loss, thereby isolating the optimization dynamics.<n>Second, we improve stability through an offset constraint combined with a positive regularization term to preserve the chosen response quality.<n>Third, we implement controllable rejection suppression using gradient separation with straightforward estimation and a tunable coefficient that linearly regulates the descent of the rejection probability.
arXiv Detail & Related papers (2025-08-20T10:17:29Z) - Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning [0.0]
We propose a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples.<n>Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability.<n>We extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference.
arXiv Detail & Related papers (2025-06-26T16:09:53Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered
by Reconfigurable Intelligent Surfaces [30.1512069754603]
We propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge.
We consider a scenario where new data are continuously generated/collected by a set of devices and are handled through a dynamic queueing system.
arXiv Detail & Related papers (2023-05-18T12:46:42Z) - Multi-Agent Reinforcement Learning with Common Policy for Antenna Tilt
Optimization [0.0]
This paper presents a method for optimizing wireless networks by adjusting cell parameters.
Agents share a common policy and take into account information from neighboring cells to determine the state and reward.
Results show how the proposed approach significantly improves the performance gains already provided by expert system-based methods.
arXiv Detail & Related papers (2023-02-24T21:19:26Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - A Parametric Class of Approximate Gradient Updates for Policy
Optimization [47.69337420768319]
We develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function.
We obtain novel yet well motivated updates that generalize existing algorithms in a way that can deliver benefits both in terms of convergence speed and final result quality.
arXiv Detail & Related papers (2022-06-17T01:28:38Z) - Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)
We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)
PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data.
We show that it achieves no-regret under conditions analogous to GP-UCB.
Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z) - Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains.
There exist no guarantees on REPS's performance when using gradient-based solvers.
We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.