Hyperbolic Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2210.01542v1
- Date: Tue, 4 Oct 2022 12:03:04 GMT
- Title: Hyperbolic Deep Reinforcement Learning
- Authors: Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J
Hunt
- Abstract summary: We propose a new class of deep reinforcement learning algorithms that model latent representations in hyperbolic space.
We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks.
- Score: 8.983647543608226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new class of deep reinforcement learning (RL) algorithms that
model latent representations in hyperbolic space. Sequential decision-making
requires reasoning about the possible future consequences of current behavior.
Consequently, capturing the relationship between key evolving features for a
given task is conducive to recovering effective policies. To this end,
hyperbolic geometry provides deep RL models with a natural basis to precisely
encode this inherently hierarchical information. However, applying existing
methodologies from the hyperbolic deep learning literature leads to fatal
optimization instabilities due to the non-stationarity and variance
characterizing RL gradient estimators. Hence, we design a new general method
that counteracts such optimization challenges and enables stable end-to-end
learning with deep hyperbolic representations. We empirically validate our
framework by applying it to popular on-policy and off-policy RL algorithms on
the Procgen and Atari 100K benchmarks, attaining near universal performance and
generalization benefits. Given its natural fit, we hope future RL research will
consider hyperbolic representations as a standard tool.
Related papers
- Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation [18.596128578766958]
Distributionally robust RL (DR-RL) addresses this issue by optimizing worst-case performance over an uncertainty set of transition dynamics.<n>We propose an online DR-RL algorithm with general function approximation that learns an optimal robust policy purely through interaction with the environment.<n>We provide a theoretical analysis establishing a near-optimal sublinear regret bound under a total variation uncertainty set, demonstrating the sample efficiency and effectiveness of our method.
arXiv Detail & Related papers (2025-12-22T02:12:04Z) - RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization [65.23034604711489]
We introduce RLoop, a self-improving framework for training large reasoning models.<n>RLoop transforms the standard training process into a virtuous cycle: it first uses RL to explore the solution space from a given policy, then filters the successful trajectories to create an expert dataset.<n>Our experiments show RLoops forgetting and substantially improves generalization, boosting average accuracy by 9% and pass@32 by over 15% compared to vanilla RL.
arXiv Detail & Related papers (2025-11-06T11:27:16Z) - Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments [3.905774454930983]
Deep reinforcement learning achieves remarkable performance but lacks interpretability.<n> SILVER framework explains RL policy via Shapley-based regression but remains restricted to low-dimensional, binary-action domains.<n>We propose SILVER with RL-guided labeling, an enhanced variant that extends SILVER to multi-action and high-dimensional environments.
arXiv Detail & Related papers (2025-10-22T05:04:43Z) - Towards Robust Zero-Shot Reinforcement Learning [22.262048244005296]
Recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner.<n>While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused suboptimal performance.<n>We propose an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality.
arXiv Detail & Related papers (2025-10-17T07:33:19Z) - Offline Reinforcement Learning with Generative Trajectory Policies [6.501269050121785]
Generative models have emerged as a powerful class of policies for offline reinforcement learning.<n>Existing methods face a stark trade-off: slow, iterative models are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance.<n>We propose Generative Trajectory Policies (GTPs), a new and more general policy paradigm that learns the entire solution map of the underlying ODE.
arXiv Detail & Related papers (2025-10-13T15:06:28Z) - TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z) - RL as Regressor: A Reinforcement Learning Approach for Function Approximation [0.0]
We propose framing regression as a Reinforcement Learning (RL) problem.<n>We demonstrate this by treating a model's prediction as an action and defining a custom reward signal based on the prediction error.<n>We show that the RL framework not only successfully solves the regression problem but also offers enhanced flexibility in defining objectives and guiding the learning process.
arXiv Detail & Related papers (2025-07-31T21:39:24Z) - Are Expressive Models Truly Necessary for Offline RL? [18.425797519857113]
Sequential modeling requires capturing accurate dynamics across long horizons in trajectory data to ensure reasonable policy performance.
We show that lightweight models as simple as shallow 2-layers can enjoy accurate dynamics consistency and significantly reduced sequential modeling errors.
arXiv Detail & Related papers (2024-12-15T17:33:56Z) - Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.
Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness.
We use transformers in conjunction with breadth-first-search to improve the learning performance.
arXiv Detail & Related papers (2024-06-10T19:29:10Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.