Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement
Learning for Obstacle Avoidance
- URL: http://arxiv.org/abs/2306.08008v1
- Date: Tue, 13 Jun 2023 09:13:13 GMT
- Title: Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement
Learning for Obstacle Avoidance
- Authors: Tim Grams
- Abstract summary: In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles.
Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets.
We propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning algorithms typically act on the same set of
actions. However, this is not sufficient for a wide range of real-world
applications where different subsets are available at each step. In this
thesis, we consider the problem of interval restrictions as they occur in
pathfinding with dynamic obstacles. When actions that lead to collisions are
avoided, the continuous action space is split into variable parts. Recent
research learns with strong assumptions on the number of intervals, is limited
to convex subsets, and the available actions are learned from the observations.
Therefore, we propose two approaches that are independent of the state of the
environment by extending parameterized reinforcement learning and ConstraintNet
to handle an arbitrary number of intervals. We demonstrate their performance in
an obstacle avoidance task and compare the methods to penalties, projection,
replacement, as well as discrete and continuous masking from the literature.
The results suggest that discrete masking of action-values is the only
effective method when constraints did not emerge during training. When
restrictions are learned, the decision between projection, masking, and our
ConstraintNet modification seems to depend on the task at hand. We compare the
results with varying complexity and give directions for future work.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly.
We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer.
We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Exploring and Exploiting Decision Boundary Dynamics for Adversarial
Robustness [59.948529997062586]
It is unclear whether existing robust training methods effectively increase the margin for each vulnerable point during training.
We propose a continuous-time framework for quantifying the relative speed of the decision boundary with respect to each individual point.
We propose Dynamics-aware Robust Training (DyART), which encourages the decision boundary to engage in movement that prioritizes increasing smaller margins.
arXiv Detail & Related papers (2023-02-06T18:54:58Z) - Interval Bound Interpolation for Few-shot Learning with Few Tasks [15.85259386116784]
Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks with a limited amount of labeled data.
We introduce the notion of interval bounds from the provably robust training literature to few-shot learning.
We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds.
arXiv Detail & Related papers (2022-04-07T15:29:27Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Utilizing Skipped Frames in Action Repeats via Pseudo-Actions [13.985534521589253]
In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point.
Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training.
We propose a simple but effective approach to alleviate this problem by introducing the concept of pseudo-actions.
arXiv Detail & Related papers (2021-05-07T02:43:44Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.