Related papers: Subequivariant Graph Reinforcement Learning in 3D Environments

Subequivariant Graph Reinforcement Learning in 3D Environments

URL: http://arxiv.org/abs/2305.18951v1
Date: Tue, 30 May 2023 11:34:57 GMT
Title: Subequivariant Graph Reinforcement Learning in 3D Environments
Authors: Runfa Chen, Jiaqi Han, Fuchun Sun, Wenbing Huang
Abstract summary: We propose a novel setup for morphology-agnostic RL, dubbed Subequivariant Graph RL in 3D environments. Specifically, we first introduce a new set of more practical yet challenging benchmarks in 3D space. To optimize the policy over the enlarged state-action space, we propose to inject geometric symmetry.
Score: 34.875774768800966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning a shared policy that guides the locomotion of different agents is of core interest in Reinforcement Learning (RL), which leads to the study of morphology-agnostic RL. However, existing benchmarks are highly restrictive in the choice of starting point and target point, constraining the movement of the agents within 2D space. In this work, we propose a novel setup for morphology-agnostic RL, dubbed Subequivariant Graph RL in 3D environments (3D-SGRL). Specifically, we first introduce a new set of more practical yet challenging benchmarks in 3D space that allows the agent to have full Degree-of-Freedoms to explore in arbitrary directions starting from arbitrary configurations. Moreover, to optimize the policy over the enlarged state-action space, we propose to inject geometric symmetry, i.e., subequivariance, into the modeling of the policy and Q-function such that the policy can generalize to all directions, improving exploration efficiency. This goal is achieved by a novel SubEquivariant Transformer (SET) that permits expressive message exchange. Finally, we evaluate the proposed method on the proposed benchmarks, where our method consistently and significantly outperforms existing approaches on single-task, multi-task, and zero-shot generalization scenarios. Extensive ablations are also conducted to verify our design. Code and videos are available on our project page: https://alpc91.github.io/SGRL/.

Related papers

Multi-start Optimization Method via Scalarization based on Target Point-based Tchebycheff Distance for Multi-objective Optimization [2.9248680865344348]
Multi-objective optimization is crucial in scientific and industrial applications where solutions must balance trade-offs among conflicting objectives. State-of-the-art methods, such as NSGA-III and MOEA/D, can handle many objectives but struggle with coverage issues. We propose a novel multi-start optimization method that addresses these challenges.
arXiv Detail & Related papers (2025-05-01T02:27:25Z)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse [5.745502268935752]
We present MetaSpatial, the first reinforcement learning-based framework designed to enhance 3D spatial reasoning in vision-language models (VLMs) Our key innovation is a multi-turn RL-based optimization mechanism that integrates physics-aware constraints and rendered image evaluations, ensuring generated 3D layouts are coherent, physically plausible, and aesthetically consistent.
arXiv Detail & Related papers (2025-03-24T09:18:01Z)
Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent [0.0]
We propose a novel framework for robotic grasping based on the idea of compressing high-dimensional target and gripper features in a common latent space. Our approach simplifies grasping by using three autoencoders dedicated to the target, the gripper, and a third one that fuses their latent representations.
arXiv Detail & Related papers (2024-11-13T12:26:08Z)
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning [59.72217833812439]
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods. ODRL contains four experimental settings where the source and target domains can be either online or offline. We conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts.
arXiv Detail & Related papers (2024-10-28T05:29:38Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL) We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z)
Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions. We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
A Game-Theoretic Perspective of Generalization in Reinforcement Learning [9.402272029807316]
Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms. We propose a game-theoretic framework for the generalization in reinforcement learning, named GiRL.
arXiv Detail & Related papers (2022-08-07T06:17:15Z)
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems [17.713459311502636]
We propose a novel evolutionary framework Zeroth-Order Actor-Critic (ZOAC) to solve sequential decision problems (SDPs) ZOAC uses step-wise exploration in parameter space and theoretically derive the zeroth-order policy gradient. It significantly outperforms EAs that treat the problem as static optimization and matches the performance of gradient-based RL methods even without first-order information.
arXiv Detail & Related papers (2022-01-29T07:09:03Z)
Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function. We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.