Distributed Control using Reinforcement Learning with
Temporal-Logic-Based Reward Shaping
- URL: http://arxiv.org/abs/2203.04172v1
- Date: Tue, 8 Mar 2022 16:03:35 GMT
- Title: Distributed Control using Reinforcement Learning with
Temporal-Logic-Based Reward Shaping
- Authors: Ningyuan Zhang, Wenliang Liu, Calin Belta
- Abstract summary: We present a framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment.
Our approach formulates the synthesis problem as a game and employs a policy graph method to find a control strategy with memory for each agent.
We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the finite state automaton to guide and accelerate the learning process.
- Score: 0.2320417845168326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a computational framework for synthesis of distributed control
strategies for a heterogeneous team of robots in a partially observable
environment. The goal is to cooperatively satisfy specifications given as
Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the
synthesis problem as a stochastic game and employs a policy graph method to
find a control strategy with memory for each agent. We construct the stochastic
game on the product between the team transition system and a finite state
automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the
quantitative semantics of TLTL as the reward of the game, and further reshape
it using the FSA to guide and accelerate the learning process. Simulation
results demonstrate the efficacy of the proposed solution under demanding task
specifications and the effectiveness of reward shaping in significantly
accelerating the speed of learning.
Related papers
- Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy.
We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch.
Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z) - Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning [12.504232513881828]
We propose a novel dependency-aware task scheduling strategy for dynamic unmanned aerial vehicle-assisted connected autonomous vehicles (CAVs)
We formulate a joint scheduling priority and subtask assignment optimization problem with the objective of minimizing the average task completion time.
We propose a diffusion-based reinforcement learning algorithm, named Synthetic DDQN based Subtasks Scheduling, which can make adaptive task scheduling decision in real time.
arXiv Detail & Related papers (2024-11-27T11:07:31Z) - Scaling Learning based Policy Optimization for Temporal Logic Tasks by Controller Network Dropout [4.421486904657393]
We introduce a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment.
We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives.
We introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling.
arXiv Detail & Related papers (2024-03-23T12:53:51Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies.
This enables integration of articulated body dynamics into deep learning frameworks.
We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Continuous Motion Planning with Temporal Logic Specifications using Deep
Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems.
The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces.
We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.