Related papers: Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping

Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping

URL: http://arxiv.org/abs/2203.04172v1
Date: Tue, 8 Mar 2022 16:03:35 GMT
Title: Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping
Authors: Ningyuan Zhang, Wenliang Liu, Calin Belta
Abstract summary: We present a framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. Our approach formulates the synthesis problem as a game and employs a policy graph method to find a control strategy with memory for each agent. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the finite state automaton to guide and accelerate the learning process.
Score: 0.2320417845168326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a computational framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. The goal is to cooperatively satisfy specifications given as Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the synthesis problem as a stochastic game and employs a policy graph method to find a control strategy with memory for each agent. We construct the stochastic game on the product between the team transition system and a finite state automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the FSA to guide and accelerate the learning process. Simulation results demonstrate the efficacy of the proposed solution under demanding task specifications and the effectiveness of reward shaping in significantly accelerating the speed of learning.

Related papers

Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next token prediction is the fundamental principle for training large language models (LLMs) We introduce R1-SGG, a multimodal LLM (M-LLM) trained via supervised fine-tuning (SFT) on the scene graph dataset. We design a graph-centric reward function that integrates node-level rewards, edge-level rewards, and a format consistency reward.
arXiv Detail & Related papers (2025-04-18T10:46:22Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning [12.504232513881828]
We propose a novel dependency-aware task scheduling strategy for dynamic unmanned aerial vehicle-assisted connected autonomous vehicles (CAVs) We formulate a joint scheduling priority and subtask assignment optimization problem with the objective of minimizing the average task completion time. We propose a diffusion-based reinforcement learning algorithm, named Synthetic DDQN based Subtasks Scheduling, which can make adaptive task scheduling decision in real time.
arXiv Detail & Related papers (2024-11-27T11:07:31Z)
Scaling Learning based Policy Optimization for Temporal Logic Tasks by Controller Network Dropout [4.421486904657393]
We introduce a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment. We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives. We introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling.
arXiv Detail & Related papers (2024-03-23T12:53:51Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs) We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z)
Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
Learning Optimal Strategies for Temporal Tasks in Stochastic Games [23.012106429532633]
We introduce a model-free reinforcement learning (RL) approach to derive controllers from given specifications. We learn optimal control strategies that maximize the probability of satisfying the specifications against the worst-case environment behavior.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z)
Continuous Motion Planning with Temporal Logic Specifications using Deep Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems. The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces. We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.