Exploration Policies for On-the-Fly Controller Synthesis: A
Reinforcement Learning Approach
- URL: http://arxiv.org/abs/2210.05393v2
- Date: Wed, 3 May 2023 20:23:36 GMT
- Title: Exploration Policies for On-the-Fly Controller Synthesis: A
Reinforcement Learning Approach
- Authors: Tom\'as Delgado, Marco S\'anchez Sorondo, V\'ictor Braberman,
Sebasti\'an Uchitel
- Abstract summary: We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL)
Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Controller synthesis is in essence a case of model-based planning for
non-deterministic environments in which plans (actually ''strategies'') are
meant to preserve system goals indefinitely. In the case of supervisory control
environments are specified as the parallel composition of state machines and
valid strategies are required to be ''non-blocking'' (i.e., always enabling the
environment to reach certain marked states) in addition to safe (i.e., keep the
system within a safe zone). Recently, On-the-fly Directed Controller Synthesis
techniques were proposed to avoid the exploration of the entire -and
exponentially large-environment space, at the cost of non-maximal
permissiveness, to either find a strategy or conclude that there is none. The
incremental exploration of the plant is currently guided by a
domain-independent human-designed heuristic. In this work, we propose a new
method for obtaining heuristics based on Reinforcement Learning (RL). The
synthesis algorithm is thus framed as an RL task with an unbounded action space
and a modified version of DQN is used. With a simple and general set of
features that abstracts both states and actions, we show that it is possible to
learn heuristics on small versions of a problem that generalize to the larger
instances, effectively doing zero-shot policy transfer. Our agents learn from
scratch in a highly partially observable RL task and outperform the existing
heuristic overall, in instances unseen during training.
Related papers
- Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors.
We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS)
LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions [3.5897534810405403]
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
arXiv Detail & Related papers (2021-09-07T00:51:12Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.