Exact Reduction of Huge Action Spaces in General Reinforcement Learning
- URL: http://arxiv.org/abs/2012.10200v1
- Date: Fri, 18 Dec 2020 12:45:03 GMT
- Title: Exact Reduction of Huge Action Spaces in General Reinforcement Learning
- Authors: Sultan Javed Majeed and Marcus Hutter
- Abstract summary: We show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original action-space size, a double-exponential improvement.
- Score: 28.19950790106291
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reinforcement learning (RL) framework formalizes the notion of learning
with interactions. Many real-world problems have large state-spaces and/or
action-spaces such as in Go, StarCraft, protein folding, and robotics or are
non-Markovian, which cause significant challenges to RL algorithms. In this
work we address the large action-space problem by sequentializing actions,
which can reduce the action-space size significantly, even down to two actions
at the expense of an increased planning horizon. We provide explicit and exact
constructions and equivalence proofs for all quantities of interest for
arbitrary history-based processes. In the case of MDPs, this could help RL
algorithms that bootstrap. In this work we show how action-binarization in the
non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
ESA allows casting any (non-MDP, non-ergodic, history-based) RL problem into a
fixed-sized non-Markovian state-space with the help of a surrogate Markovian
process. On the upside, ESA enjoys similar optimality guarantees as Markovian
models do. But a downside is that the size of the aggregated state-space
becomes exponential in the size of the action-space. In this work, we patch
this issue by binarizing the action-space. We provide an upper bound on the
number of states of this binarized ESA that is logarithmic in the original
action-space size, a double-exponential improvement.
Related papers
- Multistep Inverse Is Not All You Need [87.62730694973696]
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise.
It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables.
We propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model.
arXiv Detail & Related papers (2024-03-18T16:36:01Z) - No Prior Mask: Eliminate Redundant Action for Deep Reinforcement
Learning [13.341525656639583]
Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
arXiv Detail & Related papers (2023-12-11T09:56:02Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - METRA: Scalable Unsupervised RL with Metric-Aware Abstraction [69.90741082762646]
Metric-Aware Abstraction (METRA) is a novel unsupervised reinforcement learning objective.
By learning to move in every direction in the latent space, METRA obtains a tractable set of diverse behaviors.
We show that METRA can discover a variety of useful behaviors even in complex, pixel-based environments.
arXiv Detail & Related papers (2023-10-13T06:43:11Z) - Dynamic Neighborhood Construction for Structured Large Discrete Action
Spaces [2.285821277711785]
Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning.
Existing solution approaches can handle unstructured LDAS with up to a few million actions.
We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
arXiv Detail & Related papers (2023-05-31T14:26:14Z) - Efficient Long Sequence Modeling via State Space Augmented Transformer [92.74707853711374]
We propose SPADE, short for $underlinetextbfS$tate sunderlinetextbfP$ace.
We augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.
Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-15T20:51:27Z) - Linear Reinforcement Learning with Ball Structure Action Space [8.697177927706521]
We propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories.
In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories.
arXiv Detail & Related papers (2022-11-14T14:48:39Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Implicit Bias of Projected Subgradient Method Gives Provable Robust
Recovery of Subspaces of Unknown Codimension [12.354076490479514]
We show that Dual Principal Component Pursuit (DPCP) can provably solve problems in the it unknown subspace dimension regime.
We propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM)
In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace.
arXiv Detail & Related papers (2022-01-22T15:36:03Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z) - Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces [26.297887542066505]
We consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric.
We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space.
We show that ZoomRL achieves a worst-case regret $tildeO(Hfrac52 Kfracd+1d+2)$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space.
arXiv Detail & Related papers (2020-03-09T12:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.