XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
- URL: http://arxiv.org/abs/2504.20969v1
- Date: Tue, 29 Apr 2025 17:37:45 GMT
- Title: XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
- Authors: Yiting Zhang, Shichen Li, Elena Shrestha,
- Abstract summary: We introduce XPG-RL, a reinforcement learning framework that enables agents to efficiently perform mechanical search tasks.<n> XPG-RL integrates a task-driven action prioritization mechanism with a learned context-aware switching strategy.<n>Experiments in both simulation and real-world settings demonstrate that XPG-RL consistently outperforms baseline methods in task success rates and motion efficiency.
- Score: 0.10241134756773229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanical search (MS) in cluttered environments remains a significant challenge for autonomous manipulators, requiring long-horizon planning and robust state estimation under occlusions and partial observability. In this work, we introduce XPG-RL, a reinforcement learning framework that enables agents to efficiently perform MS tasks through explainable, priority-guided decision-making based on raw sensory inputs. XPG-RL integrates a task-driven action prioritization mechanism with a learned context-aware switching strategy that dynamically selects from a discrete set of action primitives such as target grasping, occlusion removal, and viewpoint adjustment. Within this strategy, a policy is optimized to output adaptive threshold values that govern the discrete selection among action primitives. The perception module fuses RGB-D inputs with semantic and geometric features to produce a structured scene representation for downstream decision-making. Extensive experiments in both simulation and real-world settings demonstrate that XPG-RL consistently outperforms baseline methods in task success rates and motion efficiency, achieving up to 4.5$\times$ higher efficiency in long-horizon tasks. These results underscore the benefits of integrating domain knowledge with learnable decision-making policies for robust and efficient robotic manipulation.
Related papers
- Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments [78.15330971155778]
Posterior and Diversity Synergized Task Sampling (PDTS) is an easy-to-implement method to accommodate fast and robust sequential decision-making.<n>PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios.
arXiv Detail & Related papers (2025-04-27T07:27:17Z) - A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation [4.144893164317513]
We introduce a novel framework using a decentralized partially observable Markov decision process (Dec_POMDP)<n>At the core of our methodology is the Local Information Aggregation Multi-Agent Deep Deterministic Policy Gradient (LIA_MADDPG) algorithm.<n>Our empirical evaluations show that the LIA module can be seamlessly integrated into various CTDE-based MARL methods.
arXiv Detail & Related papers (2024-11-29T07:53:05Z) - Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity [10.402855891273346]
DIVA is an evolutionary approach for generating diverse training tasks in complex, open-ended simulators.
Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior.
arXiv Detail & Related papers (2024-11-07T06:27:12Z) - Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation [24.984938229619075]
Reinforcement Learning has revolutionized decision-making processes in dynamic environments.<n>The lack of precise environmental information makes it challenging to provide clear feedback signals.<n>We develop a self-feedback mechanism for autonomous goal detection and cessation upon task completion.
arXiv Detail & Related papers (2024-09-14T21:42:17Z) - Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning [1.4542411354617986]
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS)
CNNs and Deep Reinforcement Learning (DRL) undertake multifaceted tasks encompassing perception, decision-making, and control.
DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process.
We propose an innovative abstract modeling approach grounded in spatial-temporal value semantics.
arXiv Detail & Related papers (2024-05-24T02:21:10Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.