Model-Free Reinforcement Learning for Symbolic Automata-encoded
Objectives
- URL: http://arxiv.org/abs/2202.02404v1
- Date: Fri, 4 Feb 2022 21:54:36 GMT
- Title: Model-Free Reinforcement Learning for Symbolic Automata-encoded
Objectives
- Authors: Anand Balakrishnan, Stefan Jaksic, Edgar Aguilar Lozano, Dejan
Nickovic, Jyotirmoy Deshmukh
- Abstract summary: Reinforcement learning (RL) is a popular approach for robotic path planning in uncertain environments.
Poorly designed rewards can lead to policies that do get maximal rewards but fail to satisfy desired task objectives or are unsafe.
We propose using formal specifications in the form of symbolic automata.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a popular approach for robotic path planning
in uncertain environments. However, the control policies trained for an RL
agent crucially depend on user-defined, state-based reward functions. Poorly
designed rewards can lead to policies that do get maximal rewards but fail to
satisfy desired task objectives or are unsafe. There are several examples of
the use of formal languages such as temporal logics and automata to specify
high-level task specifications for robots (in lieu of Markovian rewards).
Recent efforts have focused on inferring state-based rewards from formal
specifications; here, the goal is to provide (probabilistic) guarantees that
the policy learned using RL (with the inferred rewards) satisfies the
high-level formal specification. A key drawback of several of these techniques
is that the rewards that they infer are sparse: the agent receives positive
rewards only upon completion of the task and no rewards otherwise. This
naturally leads to poor convergence properties and high variance during RL. In
this work, we propose using formal specifications in the form of symbolic
automata: these serve as a generalization of both bounded-time temporal
logic-based specifications as well as automata. Furthermore, our use of
symbolic automata allows us to define non-sparse potential-based rewards which
empirically shape the reward surface, leading to better convergence during RL.
We also show that our potential-based rewarding strategy still allows us to
obtain the policy that maximizes the satisfaction of the given specification.
Related papers
- Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process.
We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z) - Logic-based Reward Shaping for Multi-Agent Reinforcement Learning [1.5483078145498084]
Reinforcement learning relies heavily on exploration to learn from its environment and maximize observed rewards.
Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task.
This project explores how logic-based reward shaping for Multi-Agent Reinforcement Learning can be designed for different scenarios and tasks.
arXiv Detail & Related papers (2022-06-17T16:30:27Z) - A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with
Symbolic Reward Machines [7.661766773170363]
A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning problems.
We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals.
arXiv Detail & Related papers (2022-04-20T20:22:00Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reward Machines: Exploiting Reward Function Structure in Reinforcement
Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies.
First, we propose reward machines, a type of finite state machine that supports the specification of reward functions.
We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z) - Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine.
In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves.
While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.