Related papers: Reinforcement Learning with Generalizable Gaussian Splatting

Reinforcement Learning with Generalizable Gaussian Splatting

URL: http://arxiv.org/abs/2404.07950v1
Date: Mon, 18 Mar 2024 16:50:23 GMT
Title: Reinforcement Learning with Generalizable Gaussian Splatting
Authors: Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu,
Abstract summary: An excellent representation is crucial for reinforcement learning (RL) performance. We propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task.
Score: 7.634466554585955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.

Related papers

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training [127.47044960572659]
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. This paper studies the difference between SFT and RL on generalization and memorization. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants.
arXiv Detail & Related papers (2025-01-28T18:59:44Z)
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution [10.074968164380314]
Implicit Neural Representation (INR) has been successfully employed for Arbitrary-scale Super-Resolution (ASR) We develop two novel techniques to generalize GS for ASR. We implement an efficient differentiable 2D GPU/CUDA-based scale-awareization to render super-aware images.
arXiv Detail & Related papers (2025-01-12T15:14:58Z)
Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z)
MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [84.07233691641193]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis [63.5925701087252]
We propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework.
arXiv Detail & Related papers (2024-05-31T13:48:54Z)
GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction [20.232177350064735]
We introduce a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting representation with neural Signed Distance Fields (SDF) We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions.
arXiv Detail & Related papers (2024-03-25T17:22:11Z)
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering [71.44349029439944]
Recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering.
arXiv Detail & Related papers (2023-11-30T17:58:57Z)
GS-IR: 3D Gaussian Splatting for Inverse Rendering [71.14234327414086]
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) We extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering.
arXiv Detail & Related papers (2023-11-26T02:35:09Z)
Neural Radiance Field Codebooks [53.01356339021285]
We introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric reconstruction. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1% success rate.
arXiv Detail & Related papers (2023-01-10T18:03:48Z)
Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning [27.304282924423095]
We propose Pre-trained Image for Generalizable visual reinforcement learning (PIE-G) PIE-G is a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance.
arXiv Detail & Related papers (2022-12-17T12:45:08Z)
Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning [7.972204774778987]
In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training. We introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations. We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods.
arXiv Detail & Related papers (2022-07-12T11:46:49Z)
Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable. We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z)
Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning. We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.