Related papers: Exact Reduction of Huge Action Spaces in General Reinforcement Learning

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

URL: http://arxiv.org/abs/2012.10200v1
Date: Fri, 18 Dec 2020 12:45:03 GMT
Title: Exact Reduction of Huge Action Spaces in General Reinforcement Learning
Authors: Sultan Javed Majeed and Marcus Hutter
Abstract summary: We show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds. We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original action-space size, a double-exponential improvement.
Score: 28.19950790106291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many real-world problems have large state-spaces and/or action-spaces such as in Go, StarCraft, protein folding, and robotics or are non-Markovian, which cause significant challenges to RL algorithms. In this work we address the large action-space problem by sequentializing actions, which can reduce the action-space size significantly, even down to two actions at the expense of an increased planning horizon. We provide explicit and exact constructions and equivalence proofs for all quantities of interest for arbitrary history-based processes. In the case of MDPs, this could help RL algorithms that bootstrap. In this work we show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds. ESA allows casting any (non-MDP, non-ergodic, history-based) RL problem into a fixed-sized non-Markovian state-space with the help of a surrogate Markovian process. On the upside, ESA enjoys similar optimality guarantees as Markovian models do. But a downside is that the size of the aggregated state-space becomes exponential in the size of the action-space. In this work, we patch this issue by binarizing the action-space. We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original action-space size, a double-exponential improvement.

Related papers

TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control [26.93901849666341]
In general, feasible regions for accomplishing tasks within complex high-dimensional spaces are exceedingly narrow. We propose the $textbfS$elf-$textbfI$mitative $textbfR$einforcement, where the RL algorithm also imitates potentially task-relevant trajectories. Our proposed algorithm achieves 120% performance improvement on the challenging HumanoidBench with 5% extra computation overhead.
arXiv Detail & Related papers (2025-02-24T16:55:27Z)
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z)
Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs) We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space. We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z)
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory [87.62730694973696]
STEEL is the first provably sample-efficient algorithm for learning the controllable dynamics of an Exogenous Block Markov Decision Process from a single trajectory. We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems.
arXiv Detail & Related papers (2024-10-03T21:57:21Z)
Multistep Inverse Is Not All You Need [87.62730694973696]
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. We propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model.
arXiv Detail & Related papers (2024-03-18T16:36:01Z)
No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning [13.341525656639583]
Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world. We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
arXiv Detail & Related papers (2023-12-11T09:56:02Z)
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces [2.285821277711785]
Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning. Existing solution approaches can handle unstructured LDAS with up to a few million actions. We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
arXiv Detail & Related papers (2023-05-31T14:26:14Z)
Linear Reinforcement Learning with Ball Structure Action Space [8.697177927706521]
We propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories. In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tildeOleft(fracH5d3epsilon3right)$ number of trajectories.
arXiv Detail & Related papers (2022-11-14T14:48:39Z)
Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications. We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z)
Implicit Bias of Projected Subgradient Method Gives Provable Robust Recovery of Subspaces of Unknown Codimension [12.354076490479514]
We show that Dual Principal Component Pursuit (DPCP) can provably solve problems in the it unknown subspace dimension regime. We propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM) In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace.
arXiv Detail & Related papers (2022-01-22T15:36:03Z)
Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces [26.297887542066505]
We consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space. We show that ZoomRL achieves a worst-case regret $tildeO(Hfrac52 Kfracd+1d+2)$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space.
arXiv Detail & Related papers (2020-03-09T12:32:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.