Related papers: No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2312.06258v1
Date: Mon, 11 Dec 2023 09:56:02 GMT
Title: No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning
Authors: Dianyu Zhong, Yiqin Yang, Qianchuan Zhao
Abstract summary: Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world. We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
Score: 13.341525656639583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world. The numerous redundant actions will cause the agents to make repeated or invalid attempts, even leading to task failure. Although current algorithms conduct some initial explorations for this issue, they either suffer from rule-based systems or depend on expert demonstrations, which significantly limits their applicability in many real-world settings. In this work, we examine the theoretical analysis of what action can be eliminated in policy optimization and propose a novel redundant action filtering mechanism. Unlike other works, our method constructs the similarity factor by estimating the distance between the state distributions, which requires no prior knowledge. In addition, we combine the modified inverse model to avoid extensive computation in high-dimensional state space. We reveal the underlying structure of action spaces and propose a simple yet efficient redundant action filtering mechanism named No Prior Mask (NPM) based on the above techniques. We show the superior performance of our method by conducting extensive experiments on high-dimensional, pixel-input, and stochastic problems with various action redundancy. Our code is public online at https://github.com/zhongdy15/npm.

Related papers

Recursive Deep Inverse Reinforcement Learning [16.05411507856928]
Inferring an adversary's goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals.
arXiv Detail & Related papers (2025-04-17T17:39:35Z)
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models [1.6112718683989882]
We introduce a novel white-box approach for creating adversarial perturbations against LLMs. We first identify acceptance subspaces - sets of feature vectors that do not trigger the model's refusal mechanisms. We then use gradient-based optimization to reroute embeddings from refusal subspaces to acceptance subspaces, effectively achieving jailbreaks.
arXiv Detail & Related papers (2025-03-08T16:29:45Z)
Context Enhancement with Reconstruction as Sequence for Unified Unsupervised Anomaly Detection [68.74469657656822]
Unsupervised anomaly detection (AD) aims to train robust detection models using only normal samples. Recent research focuses on a unified unsupervised AD setting in which only one model is trained for all classes. We introduce a novel Reconstruction as Sequence (RAS) method, which enhances the contextual correspondence during feature reconstruction.
arXiv Detail & Related papers (2024-09-10T07:37:58Z)
Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z)
Accelerating Search-Based Planning for Multi-Robot Manipulation by Leveraging Online-Generated Experiences [20.879194337982803]
Multi-Agent Path-Finding (MAPF) algorithms have shown promise in discrete 2D domains, providing rigorous guarantees. We propose an approach for accelerating conflict-based search algorithms by leveraging their repetitive and incremental nature.
arXiv Detail & Related papers (2024-03-29T20:31:07Z)
Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning [25.342811509665097]
Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large. In this work, we address these challenges by applying a (state) conditional normalizing flow to compactly represent the policy.
arXiv Detail & Related papers (2023-11-26T15:57:20Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Continuous Control with Action Quantization from Demonstrations [35.44893918778709]
In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems. We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces. We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
arXiv Detail & Related papers (2021-10-19T17:59:04Z)
Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z)
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding. We propose the first purely anchor-free temporal localization method. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z)
Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain. We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection. Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems [1.6939372704265414]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. Our methods specifically address the computational challenges of partially observable multiagent problems.
arXiv Detail & Related papers (2020-11-09T06:51:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.