Related papers: Environment Shaping in Reinforcement Learning using State Abstraction

Environment Shaping in Reinforcement Learning using State Abstraction

URL: http://arxiv.org/abs/2006.13160v1
Date: Tue, 23 Jun 2020 17:00:22 GMT
Title: Environment Shaping in Reinforcement Learning using State Abstraction
Authors: Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla
Abstract summary: We propose a novel framework of emphenvironment shaping using state abstraction. Our key idea is to compress the environment's large state space with noisy signals to an abstracted space. We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
Score: 63.444831173608605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the central challenges faced by a reinforcement learning (RL) agent is to effectively learn a (near-)optimal policy in environments with large state spaces having sparse and noisy feedback signals. In real-world applications, an expert with additional domain knowledge can help in speeding up the learning process via \emph{shaping the environment}, i.e., making the environment more learner-friendly. A popular paradigm in literature is \emph{potential-based reward shaping}, where the environment's reward function is augmented with additional local rewards using a potential function. However, the applicability of potential-based reward shaping is limited in settings where (i) the state space is very large, and it is challenging to compute an appropriate potential function, (ii) the feedback signals are noisy, and even with shaped rewards the agent could be trapped in local optima, and (iii) changing the rewards alone is not sufficient, and effective shaping requires changing the dynamics. We address these limitations of potential-based shaping methods and propose a novel framework of \emph{environment shaping using state abstraction}. Our key idea is to compress the environment's large state space with noisy signals to an abstracted space, and to use this abstraction in creating smoother and more effective feedback signals for the agent. We study the theoretical underpinnings of our abstraction-based environment shaping, and show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.

Related papers

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning [16.15673339648566]
Goal-Conditioned Reinforcement Learning (GCRL) enables agents to autonomously acquire diverse behaviors.<n>In the online setting, where agents learn representations while exploring, the latent space evolves with the agent's policy.
arXiv Detail & Related papers (2025-05-23T12:43:55Z)
Learning Latent Dynamic Robust Representations for World Models [9.806852421730165]
Visual Model-Based Reinforcement Learning (MBL) promises to agent's knowledge about the underlying dynamics of the environment. Top-temporal agents such as Dreamer often struggle with visual pixel-based inputs in the presence of irrelevant noise in the observation space. We apply a-temporal masking strategy, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models.
arXiv Detail & Related papers (2024-05-10T06:28:42Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Discrete State-Action Abstraction via the Successor Representation [3.453310639983932]
Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space. Our approach is the first for automatically learning a discrete abstraction of the underlying environment. Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment.
arXiv Detail & Related papers (2022-06-07T17:37:30Z)
Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem. We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies. We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z)
Zero-Shot Reinforcement Learning on Graphs for Autonomous Exploration Under Uncertainty [6.42522897323111]
We present a framework for self-learning a high-performance exploration policy in a single simulation environment. We propose a novel approach that uses graph neural networks in conjunction with deep reinforcement learning.
arXiv Detail & Related papers (2021-05-11T02:42:17Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier. understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.