Related papers: Offline Reinforcement Learning With Combinatorial Action Spaces

Offline Reinforcement Learning With Combinatorial Action Spaces

URL: http://arxiv.org/abs/2410.21151v1
Date: Mon, 28 Oct 2024 15:49:46 GMT
Title: Offline Reinforcement Learning With Combinatorial Action Spaces
Authors: Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab,
Abstract summary: Reinforcement learning problems often involve large action spaces arising from the simultaneous execution of multiple sub-actions. We propose Branch Value Estimation (BVE), which effectively captures sub-action dependencies and scales to large spaces by learning to evaluate only a small subset of actions at each timestep. Our experiments show that BVE outperforms state-of-the-art methods across a range of action space sizes.
Score: 12.904199719046968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning problems often involve large action spaces arising from the simultaneous execution of multiple sub-actions, resulting in combinatorial action spaces. Learning in combinatorial action spaces is difficult due to the exponential growth in action space size with the number of sub-actions and the dependencies among these sub-actions. In offline settings, this challenge is compounded by limited and suboptimal data. Current methods for offline learning in combinatorial spaces simplify the problem by assuming sub-action independence. We propose Branch Value Estimation (BVE), which effectively captures sub-action dependencies and scales to large combinatorial spaces by learning to evaluate only a small subset of actions at each timestep. Our experiments show that BVE outperforms state-of-the-art methods across a range of action space sizes.

Related papers

Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions. This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces [52.649077293256795]
Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems. We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
arXiv Detail & Related papers (2024-10-21T07:13:45Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance [0.0]
In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets. We propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals.
arXiv Detail & Related papers (2023-06-13T09:13:13Z)
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces [2.285821277711785]
Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning. Existing solution approaches can handle unstructured LDAS with up to a few million actions. We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
arXiv Detail & Related papers (2023-05-31T14:26:14Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Hierarchical Compositional Representations for Few-shot Action Recognition [51.288829293306335]
We propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition. We divide a complicated action into several sub-actions by carefully designed hierarchical clustering. We also adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations.
arXiv Detail & Related papers (2022-08-19T16:16:59Z)
Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy [0.0]
We propose Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to handle multi-agent problems with hybrid action spaces. This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems. Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
arXiv Detail & Related papers (2022-06-10T13:52:59Z)
Generalising Discrete Action Spaces with Conditional Action Trees [0.0]
We introduce em Conditional Action Trees with two main objectives. We show several proof-of-concept experiments ranging from environments with discrete action spaces to those with large action spaces commonly found in RTS-style games.
arXiv Detail & Related papers (2021-04-15T08:10:18Z)
LASER: Learning a Latent Action Space for Efficient Reinforcement Learning [41.53297694894669]
We present LASER, a method to learn latent action spaces for efficient reinforcement learning. We show improved sample efficiency compared to the original action space from better alignment of the action space to the task space, as we observe with visualizations of the learned action space manifold.
arXiv Detail & Related papers (2021-03-29T17:40:02Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.