No Prior Mask: Eliminate Redundant Action for Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2312.06258v1
- Date: Mon, 11 Dec 2023 09:56:02 GMT
- Title: No Prior Mask: Eliminate Redundant Action for Deep Reinforcement
Learning
- Authors: Dianyu Zhong, Yiqin Yang, Qianchuan Zhao
- Abstract summary: Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
- Score: 13.341525656639583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large action space is one fundamental obstacle to deploying Reinforcement
Learning methods in the real world. The numerous redundant actions will cause
the agents to make repeated or invalid attempts, even leading to task failure.
Although current algorithms conduct some initial explorations for this issue,
they either suffer from rule-based systems or depend on expert demonstrations,
which significantly limits their applicability in many real-world settings. In
this work, we examine the theoretical analysis of what action can be eliminated
in policy optimization and propose a novel redundant action filtering
mechanism. Unlike other works, our method constructs the similarity factor by
estimating the distance between the state distributions, which requires no
prior knowledge. In addition, we combine the modified inverse model to avoid
extensive computation in high-dimensional state space. We reveal the underlying
structure of action spaces and propose a simple yet efficient redundant action
filtering mechanism named No Prior Mask (NPM) based on the above techniques. We
show the superior performance of our method by conducting extensive experiments
on high-dimensional, pixel-input, and stochastic problems with various action
redundancy. Our code is public online at https://github.com/zhongdy15/npm.
Related papers
- Context Enhancement with Reconstruction as Sequence for Unified Unsupervised Anomaly Detection [68.74469657656822]
Unsupervised anomaly detection (AD) aims to train robust detection models using only normal samples.
Recent research focuses on a unified unsupervised AD setting in which only one model is trained for all classes.
We introduce a novel Reconstruction as Sequence (RAS) method, which enhances the contextual correspondence during feature reconstruction.
arXiv Detail & Related papers (2024-09-10T07:37:58Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Accelerating Search-Based Planning for Multi-Robot Manipulation by Leveraging Online-Generated Experiences [20.879194337982803]
Multi-Agent Path-Finding (MAPF) algorithms have shown promise in discrete 2D domains, providing rigorous guarantees.
We propose an approach for accelerating conflict-based search algorithms by leveraging their repetitive and incremental nature.
arXiv Detail & Related papers (2024-03-29T20:31:07Z) - Generative Modelling of Stochastic Actions with Arbitrary Constraints in
Reinforcement Learning [25.342811509665097]
Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces.
A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large.
In this work, we address these challenges by applying a (state) conditional normalizing flow to compactly represent the policy.
arXiv Detail & Related papers (2023-11-26T15:57:20Z) - Continuous Control with Action Quantization from Demonstrations [35.44893918778709]
In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
arXiv Detail & Related papers (2021-10-19T17:59:04Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Multiagent Rollout and Policy Iteration for POMDP with Application to
Multi-Robot Repair Problems [1.6939372704265414]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure.
Our methods specifically address the computational challenges of partially observable multiagent problems.
arXiv Detail & Related papers (2020-11-09T06:51:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.