Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions
- URL: http://arxiv.org/abs/2512.20831v1
- Date: Tue, 23 Dec 2025 23:12:53 GMT
- Title: Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions
- Authors: Rashmeet Kaur Nayyar, Naman Shah, Siddharth Srivastava,
- Abstract summary: This paper extends the scope of reinforcement learning algorithms to long-horizon, sparse-reward settings with parameterized actions.<n>We introduce algorithms that progressively refine these abstractions during learning, increasing fine-grained detail in the critical regions of the state-action space.<n>Across several continuous-state, parameterized-action domains, our abstraction-driven approach enables TD($$) to achieve markedly higher sample efficiency than state-of-the-art baselines.
- Score: 22.730282038941382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world sequential decision-making often involves parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. Existing approaches exhibit severe limitations in this setting -- planning methods demand hand-crafted action models, and standard reinforcement learning (RL) algorithms are designed for either discrete or continuous actions but not both, and the few RL methods that handle parameterized actions typically rely on domain-specific engineering and fail to exploit the latent structure of these spaces. This paper extends the scope of RL algorithms to long-horizon, sparse-reward settings with parameterized actions by enabling agents to autonomously learn both state and action abstractions online. We introduce algorithms that progressively refine these abstractions during learning, increasing fine-grained detail in the critical regions of the state-action space where greater resolution improves performance. Across several continuous-state, parameterized-action domains, our abstraction-driven approach enables TD($λ$) to achieve markedly higher sample efficiency than state-of-the-art baselines.
Related papers
- DynaAct: Large Language Model Reasoning with Dynamic Action Spaces [58.298135359318024]
We propose a novel framework named textscDynaAct for automatically constructing a compact action space.<n>Our approach significantly improves overall performance, while maintaining efficient inference without introducing substantial latency.
arXiv Detail & Related papers (2025-11-11T09:47:13Z) - DEAS: DEtached value learning with Action Sequence for Scalable Offline RL [46.40818333031899]
Action Sequence (DEAS) is a simple yet effective offline RL framework that leverages action sequences for value learning.<n>DEAS consistently outperforms baselines on complex, long-horizon tasks from OGBench.<n>It can be applied to enhance the performance of large-scale Vision-Language-Action models.
arXiv Detail & Related papers (2025-10-09T03:11:09Z) - RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems [98.98963933669751]
We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution.<n>This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator.<n>We show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets.
arXiv Detail & Related papers (2025-10-02T17:44:23Z) - Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning [65.31677646659895]
Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters.<n>We propose a framework to clearly define task-specific directions (TSDs) and explore their properties and practical utilization challenges.<n>We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process.
arXiv Detail & Related papers (2024-09-02T08:10:51Z) - Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning [1.4542411354617986]
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS)
CNNs and Deep Reinforcement Learning (DRL) undertake multifaceted tasks encompassing perception, decision-making, and control.
DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process.
We propose an innovative abstract modeling approach grounded in spatial-temporal value semantics.
arXiv Detail & Related papers (2024-05-24T02:21:10Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.