Noisy Symbolic Abstractions for Deep RL: A case study with Reward
Machines
- URL: http://arxiv.org/abs/2211.10902v2
- Date: Wed, 23 Nov 2022 05:05:41 GMT
- Title: Noisy Symbolic Abstractions for Deep RL: A case study with Reward
Machines
- Authors: Andrew C. Li, Zizhao Chen, Pashootan Vaezipoor, Toryn Q. Klassen,
Rodrigo Toro Icarte, Sheila A. McIlraith
- Abstract summary: We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines.
We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions.
- Score: 23.15484341058261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural and formal languages provide an effective mechanism for humans to
specify instructions and reward functions. We investigate how to generate
policies via RL when reward functions are specified in a symbolic language
captured by Reward Machines, an increasingly popular automaton-inspired
structure. We are interested in the case where the mapping of environment state
to a symbolic (here, Reward Machine) vocabulary -- commonly known as the
labelling function -- is uncertain from the perspective of the agent. We
formulate the problem of policy learning in Reward Machines with noisy symbolic
abstractions as a special class of POMDP optimization problem, and investigate
several methods to address the problem, building on existing and new
techniques, the latter focused on predicting Reward Machine state, rather than
on grounding of individual symbols. We analyze these methods and evaluate them
experimentally under varying degrees of uncertainty in the correct
interpretation of the symbolic vocabulary. We verify the strength of our
approach and the limitation of existing methods via an empirical investigation
on both illustrative, toy domains and partially observable, deep RL domains.
Related papers
- Neural Reward Machines [2.0755366440393743]
Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment.
We define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic RL domains.
We show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge.
arXiv Detail & Related papers (2024-08-16T11:44:27Z) - Reward Machines for Deep RL in Noisy and Uncertain Environments [18.42439732953552]
We study the use of Reward Machines for Deep RL in noisy and uncertain environments.
We propose a suite of RL algorithms that exploit task structure under uncertain interpretation of the domain-specific vocabulary.
arXiv Detail & Related papers (2024-05-31T18:22:09Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - Goal Space Abstraction in Hierarchical Reinforcement Learning via
Reachability Analysis [0.0]
We propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states.
We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.
arXiv Detail & Related papers (2023-09-12T06:53:11Z) - Learning Symbolic Representations for Reinforcement Learning of
Non-Markovian Behavior [23.20013012953065]
We show how to automatically discover useful state abstractions that support learning automata over the state-action history.
The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL.
arXiv Detail & Related papers (2023-01-08T00:47:19Z) - Symbolic Visual Reinforcement Learning: A Scalable Framework with
Object-Level Abstraction and Differentiable Expression Search [63.3745291252038]
We propose DiffSES, a novel symbolic learning approach that discovers discrete symbolic policies.
By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions.
Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more scalable than state-of-the-art symbolic RL methods.
arXiv Detail & Related papers (2022-12-30T17:50:54Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Provably Sample-Efficient RL with Side Information about Latent Dynamics [12.461789905893026]
We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space.
We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
arXiv Detail & Related papers (2022-05-27T21:07:03Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.