Dynamic Neighborhood Construction for Structured Large Discrete Action
Spaces
- URL: http://arxiv.org/abs/2305.19891v4
- Date: Tue, 27 Feb 2024 10:07:06 GMT
- Title: Dynamic Neighborhood Construction for Structured Large Discrete Action
Spaces
- Authors: Fabian Akkerman, Julius Luy, Wouter van Heeswijk, Maximilian Schiffer
- Abstract summary: Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning.
Existing solution approaches can handle unstructured LDAS with up to a few million actions.
We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
- Score: 2.285821277711785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large discrete action spaces (LDAS) remain a central challenge in
reinforcement learning. Existing solution approaches can handle unstructured
LDAS with up to a few million actions. However, many real-world applications in
logistics, production, and transportation systems have combinatorial action
spaces, whose size grows well beyond millions of actions, even on small
instances. Fortunately, such action spaces exhibit structure, e.g., equally
spaced discrete resource units. With this work, we focus on handling structured
LDAS (SLDAS) with sizes that cannot be handled by current benchmarks: we
propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm
for SLDAS. We present a scalable neighborhood exploration heuristic that
utilizes this paradigm and efficiently explores the discrete neighborhood
around the continuous proxy action in structured action spaces with up to
$10^{73}$ actions. We demonstrate the performance of our method by benchmarking
it against three state-of-the-art approaches designed for large discrete action
spaces across two distinct environments. Our results show that DNC matches or
outperforms state-of-the-art approaches while being computationally more
efficient. Furthermore, our method scales to action spaces that so far remained
computationally intractable for existing methodologies.
Related papers
- Offline Reinforcement Learning With Combinatorial Action Spaces [12.904199719046968]
Reinforcement learning problems often involve large action spaces arising from the simultaneous execution of multiple sub-actions.
We propose Branch Value Estimation (BVE), which effectively captures sub-action dependencies and scales to large spaces by learning to evaluate only a small subset of actions at each timestep.
Our experiments show that BVE outperforms state-of-the-art methods across a range of action space sizes.
arXiv Detail & Related papers (2024-10-28T15:49:46Z) - Grounding Multimodal Large Language Models in Actions [65.88208317380793]
We study how to best ground a MLLM into different embodiments and their associated action spaces.
For continuous actions, we show that a learned tokenization allows for sufficient modeling precision.
For discrete actions, we demonstrate that semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance.
arXiv Detail & Related papers (2024-06-12T06:12:04Z) - Accelerating Search-Based Planning for Multi-Robot Manipulation by Leveraging Online-Generated Experiences [20.879194337982803]
Multi-Agent Path-Finding (MAPF) algorithms have shown promise in discrete 2D domains, providing rigorous guarantees.
We propose an approach for accelerating conflict-based search algorithms by leveraging their repetitive and incremental nature.
arXiv Detail & Related papers (2024-03-29T20:31:07Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Continuous Control with Action Quantization from Demonstrations [35.44893918778709]
In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
arXiv Detail & Related papers (2021-10-19T17:59:04Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Learning Space Partitions for Path Planning [54.475949279050596]
PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima.
These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
arXiv Detail & Related papers (2021-06-19T18:06:11Z) - Generalising Discrete Action Spaces with Conditional Action Trees [0.0]
We introduce em Conditional Action Trees with two main objectives.
We show several proof-of-concept experiments ranging from environments with discrete action spaces to those with large action spaces commonly found in RTS-style games.
arXiv Detail & Related papers (2021-04-15T08:10:18Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.