Exact Reduction of Huge Action Spaces in General Reinforcement Learning
- URL: http://arxiv.org/abs/2012.10200v1
- Date: Fri, 18 Dec 2020 12:45:03 GMT
- Title: Exact Reduction of Huge Action Spaces in General Reinforcement Learning
- Authors: Sultan Javed Majeed and Marcus Hutter
- Abstract summary: We show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original action-space size, a double-exponential improvement.
- Score: 28.19950790106291
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reinforcement learning (RL) framework formalizes the notion of learning
with interactions. Many real-world problems have large state-spaces and/or
action-spaces such as in Go, StarCraft, protein folding, and robotics or are
non-Markovian, which cause significant challenges to RL algorithms. In this
work we address the large action-space problem by sequentializing actions,
which can reduce the action-space size significantly, even down to two actions
at the expense of an increased planning horizon. We provide explicit and exact
constructions and equivalence proofs for all quantities of interest for
arbitrary history-based processes. In the case of MDPs, this could help RL
algorithms that bootstrap. In this work we show how action-binarization in the
non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
ESA allows casting any (non-MDP, non-ergodic, history-based) RL problem into a
fixed-sized non-Markovian state-space with the help of a surrogate Markovian
process. On the upside, ESA enjoys similar optimality guarantees as Markovian
models do. But a downside is that the size of the aggregated state-space
becomes exponential in the size of the action-space. In this work, we patch
this issue by binarizing the action-space. We provide an upper bound on the
number of states of this binarized ESA that is logarithmic in the original
action-space size, a double-exponential improvement.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs)
We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space.
We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z) - Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory [87.62730694973696]
STEEL is the first provably sample-efficient algorithm for learning the controllable dynamics of an Exogenous Block Markov Decision Process from a single trajectory.
We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems.
arXiv Detail & Related papers (2024-10-03T21:57:21Z) - Multistep Inverse Is Not All You Need [87.62730694973696]
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise.
It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables.
We propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model.
arXiv Detail & Related papers (2024-03-18T16:36:01Z) - No Prior Mask: Eliminate Redundant Action for Deep Reinforcement
Learning [13.341525656639583]
Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
arXiv Detail & Related papers (2023-12-11T09:56:02Z) - Dynamic Neighborhood Construction for Structured Large Discrete Action
Spaces [2.285821277711785]
Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning.
Existing solution approaches can handle unstructured LDAS with up to a few million actions.
We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
arXiv Detail & Related papers (2023-05-31T14:26:14Z) - Linear Reinforcement Learning with Ball Structure Action Space [8.697177927706521]
We propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories.
In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories.
arXiv Detail & Related papers (2022-11-14T14:48:39Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Implicit Bias of Projected Subgradient Method Gives Provable Robust
Recovery of Subspaces of Unknown Codimension [12.354076490479514]
We show that Dual Principal Component Pursuit (DPCP) can provably solve problems in the it unknown subspace dimension regime.
We propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM)
In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace.
arXiv Detail & Related papers (2022-01-22T15:36:03Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z) - Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces [26.297887542066505]
We consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric.
We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space.
We show that ZoomRL achieves a worst-case regret $tildeO(Hfrac52 Kfracd+1d+2)$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space.
arXiv Detail & Related papers (2020-03-09T12:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.