Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2502.01268v1
- Date: Mon, 03 Feb 2025 11:39:12 GMT
- Title: Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning
- Authors: Eslam Eldeeb, Hirley Alves,
- Abstract summary: This work proposes a novel, resilient, few-shot meta-offline RL algorithm combining offline RL and model-agnostic meta-learning.
We show that the proposed few-shot meta-offline RL algorithm converges faster than baseline schemes.
It is the only algorithm that can achieve optimal joint AoI and transmission power using an offline dataset.
- Score: 5.771885923067511
- License:
- Abstract: Reinforcement learning (RL) has been a promising essence in future 5G-beyond and 6G systems. Its main advantage lies in its robust model-free decision-making in complex and large-dimension wireless environments. However, most existing RL frameworks rely on online interaction with the environment, which might not be feasible due to safety and cost concerns. Another problem with online RL is the lack of scalability of the designed algorithm with dynamic or new environments. This work proposes a novel, resilient, few-shot meta-offline RL algorithm combining offline RL using conservative Q-learning (CQL) and meta-learning using model-agnostic meta-learning (MAML). The proposed algorithm can train RL models using static offline datasets without any online interaction with the environments. In addition, with the aid of MAML, the proposed model can be scaled up to new unseen environments. We showcase the proposed algorithm for optimizing an unmanned aerial vehicle (UAV) 's trajectory and scheduling policy to minimize the age-of-information (AoI) and transmission power of limited-power devices. Numerical results show that the proposed few-shot meta-offline RL algorithm converges faster than baseline schemes, such as deep Q-networks and CQL. In addition, it is the only algorithm that can achieve optimal joint AoI and transmission power using an offline dataset with few shots of data points and is resilient to network failures due to unprecedented environmental changes.
Related papers
- Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.
Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.
We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation [3.687363450234871]
Link adaptation (LA) is an essential function in modern wireless communication systems.
LA dynamically adjusts the transmission rate of a communication link to match time- and frequency-varying radio link conditions.
Recent research has introduced online reinforcement learning approaches as an alternative to the more commonly used rule-based algorithms.
arXiv Detail & Related papers (2024-10-30T14:01:31Z) - Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control.
There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging.
We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Model-Based Offline Planning with Trajectory Pruning [15.841609263723575]
offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction.
We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning.
Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
arXiv Detail & Related papers (2021-05-16T05:00:54Z) - Offline Meta-Reinforcement Learning with Advantage Weighting [125.21298190780259]
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting.
offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data.
We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
arXiv Detail & Related papers (2020-08-13T17:57:14Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.