Point Cloud Models Improve Visual Robustness in Robotic Learners
- URL: http://arxiv.org/abs/2404.18926v1
- Date: Mon, 29 Apr 2024 17:59:11 GMT
- Title: Point Cloud Models Improve Visual Robustness in Robotic Learners
- Authors: Skand Peri, Iain Lee, Chanho Kim, Li Fuxin, Tucker Hermans, Stefan Lee,
- Abstract summary: We introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies.
Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts.
Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners.
- Score: 18.23824531384375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners. Project Webpage: https://pvskand.github.io/projects/PCWM
Related papers
- P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising [81.92854168911704]
We tackle the task of point cloud denoising through a novel framework that adapts Diffusion Schr"odinger bridges to points clouds.
Experiments on object datasets show that P2P-Bridge achieves significant improvements over existing methods.
arXiv Detail & Related papers (2024-08-29T08:00:07Z) - Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning [58.69297999175239]
In robot learning, the observation space is crucial due to the distinct characteristics of different modalities.
In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.
arXiv Detail & Related papers (2024-02-04T14:18:45Z) - Adaptive Point Transformer [88.28498667506165]
Adaptive Point Cloud Transformer (AdaPT) is a standard PT model augmented by an adaptive token selection mechanism.
AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds.
arXiv Detail & Related papers (2024-01-26T13:24:45Z) - Test-Time Augmentation for 3D Point Cloud Classification and
Segmentation [40.62640761825697]
Data augmentation is a powerful technique to enhance the performance of a deep learning task.
This work explores test-time augmentation (TTA) for 3D point clouds.
arXiv Detail & Related papers (2023-11-22T04:31:09Z) - Point2Vec for Self-Supervised Representation Learning on Point Clouds [66.53955515020053]
We extend data2vec to the point cloud domain and report encouraging results on several downstream tasks.
We propose point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds.
arXiv Detail & Related papers (2023-03-29T10:08:29Z) - ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised
Pointcloud Understanding [3.7966094046587786]
We propose a lightweight Vision-and-Pointcloud Transformer (ViPFormer) to unify image and point cloud processing in a single architecture.
ViPFormer learns in an unsupervised manner by optimizing intra-modal and cross-modal contrastive objectives.
Experiments on different datasets show ViPFormer surpasses previous state-of-the-art unsupervised methods with higher accuracy, lower model complexity and runtime latency.
arXiv Detail & Related papers (2023-03-25T06:47:12Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - Lateral Ego-Vehicle Control without Supervision using Point Clouds [50.40632021583213]
Existing vision based supervised approaches to lateral vehicle control are capable of directly mapping RGB images to the appropriate steering commands.
This paper proposes a framework for training a more robust and scalable model for lateral vehicle control.
Online experiments show that the performance of our method is superior to that of the supervised model.
arXiv Detail & Related papers (2022-03-20T21:57:32Z) - Self-supervised Learning of Point Clouds via Orientation Estimation [19.31778462735251]
We leverage 3D self-supervision for learning downstream tasks on point clouds with fewer labels.
A point cloud can be rotated in infinitely many ways, which provides a rich label-free source for self-supervision.
arXiv Detail & Related papers (2020-08-01T17:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.