Action Pick-up in Dynamic Action Space Reinforcement Learning
- URL: http://arxiv.org/abs/2304.00873v1
- Date: Mon, 3 Apr 2023 10:55:16 GMT
- Title: Action Pick-up in Dynamic Action Space Reinforcement Learning
- Authors: Jiaqi Ye, Xiaodong Li, Pangjing Wu, Feng Wang
- Abstract summary: We propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions.
In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience.
We then design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy.
- Score: 6.15205100319133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most reinforcement learning algorithms are based on a key assumption that
Markov decision processes (MDPs) are stationary. However, non-stationary MDPs
with dynamic action space are omnipresent in real-world scenarios. Yet problems
of dynamic action space reinforcement learning have been studied by many
previous works, how to choose valuable actions from new and unseen actions to
improve learning efficiency remains unaddressed. To tackle this problem, we
propose an intelligent Action Pick-up (AP) algorithm to autonomously choose
valuable actions that are most likely to boost performance from a set of new
actions. In this paper, we first theoretically analyze and find that a prior
optimal policy plays an important role in action pick-up by providing useful
knowledge and experience. Then, we design two different AP methods:
frequency-based global method and state clustering-based local method, based on
the prior optimal policy. Finally, we evaluate the AP on two simulated but
challenging environments where action spaces vary over time. Experimental
results demonstrate that our proposed AP has advantages over baselines in
learning efficiency.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking [7.590209768166108]
We introduce three continuous action masking methods for mapping the action space to the state-dependent set of relevant actions.
Our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications.
arXiv Detail & Related papers (2024-06-06T02:55:16Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization [15.945378631406024]
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading.
In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets.
This paper formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.
We propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference.
arXiv Detail & Related papers (2021-05-18T16:50:42Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z) - Zeroth-Order Supervised Policy Improvement [94.0748002906652]
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL)
We propose Zeroth-Order Supervised Policy Improvement (ZOSPI)
ZOSPI exploits the estimated value function $Q$ globally while preserving the local exploitation of the PG methods.
arXiv Detail & Related papers (2020-06-11T16:49:23Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.