Environment Shaping in Reinforcement Learning using State Abstraction
- URL: http://arxiv.org/abs/2006.13160v1
- Date: Tue, 23 Jun 2020 17:00:22 GMT
- Title: Environment Shaping in Reinforcement Learning using State Abstraction
- Authors: Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla
- Abstract summary: We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
- Score: 63.444831173608605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the central challenges faced by a reinforcement learning (RL) agent is
to effectively learn a (near-)optimal policy in environments with large state
spaces having sparse and noisy feedback signals. In real-world applications, an
expert with additional domain knowledge can help in speeding up the learning
process via \emph{shaping the environment}, i.e., making the environment more
learner-friendly. A popular paradigm in literature is \emph{potential-based
reward shaping}, where the environment's reward function is augmented with
additional local rewards using a potential function. However, the applicability
of potential-based reward shaping is limited in settings where (i) the state
space is very large, and it is challenging to compute an appropriate potential
function, (ii) the feedback signals are noisy, and even with shaped rewards the
agent could be trapped in local optima, and (iii) changing the rewards alone is
not sufficient, and effective shaping requires changing the dynamics. We
address these limitations of potential-based shaping methods and propose a
novel framework of \emph{environment shaping using state abstraction}. Our key
idea is to compress the environment's large state space with noisy signals to
an abstracted space, and to use this abstraction in creating smoother and more
effective feedback signals for the agent. We study the theoretical
underpinnings of our abstraction-based environment shaping, and show that the
agent's policy learnt in the shaped environment preserves near-optimal behavior
in the original environment.
Related papers
- Learning Latent Dynamic Robust Representations for World Models [9.806852421730165]
Visual Model-Based Reinforcement Learning (MBL) promises to agent's knowledge about the underlying dynamics of the environment.
Top-temporal agents such as Dreamer often struggle with visual pixel-based inputs in the presence of irrelevant noise in the observation space.
We apply a-temporal masking strategy, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models.
arXiv Detail & Related papers (2024-05-10T06:28:42Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Discrete State-Action Abstraction via the Successor Representation [3.453310639983932]
Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space.
Our approach is the first for automatically learning a discrete abstraction of the underlying environment.
Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment.
arXiv Detail & Related papers (2022-06-07T17:37:30Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Zero-Shot Reinforcement Learning on Graphs for Autonomous Exploration
Under Uncertainty [6.42522897323111]
We present a framework for self-learning a high-performance exploration policy in a single simulation environment.
We propose a novel approach that uses graph neural networks in conjunction with deep reinforcement learning.
arXiv Detail & Related papers (2021-05-11T02:42:17Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.