Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
- URL: http://arxiv.org/abs/2404.11737v1
- Date: Wed, 17 Apr 2024 20:41:49 GMT
- Title: Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
- Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel,
- Abstract summary: We propose atemporal equivariant learning framework by considering both spatial and temporal augmentations jointly.
We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
- Score: 37.142470149311904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervision signal while also retaining information of geometric relationships between transformed feature representations. This can enable improved performance in downstream tasks that are equivariant to such transformations. In this paper, we propose a spatio-temporal equivariant learning framework by considering both spatial and temporal augmentations jointly. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results. To leverage real-world object deformations and motion, we consider sequential LiDAR scene pairs and develop a novel 3D scene flow-based equivariance objective that leads to improved performance overall. We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
Related papers
- FRED: Towards a Full Rotation-Equivariance in Aerial Image Object
Detection [28.47314201641291]
We introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED)
Our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.
arXiv Detail & Related papers (2023-12-22T09:31:43Z) - Structuring Representation Geometry with Rotationally Equivariant
Contrastive Learning [42.20218717636608]
Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data.
We extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple transformations of embedding space.
We show that merely combining our equivariant loss with a non-collapse term results in non-trivial representations.
arXiv Detail & Related papers (2023-06-24T10:07:52Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Self-supervised learning of Split Invariant Equivariant representations [0.0]
We introduce 3DIEBench, consisting of renderings from 3D models over 55 classes and more than 2.5 million images where we have full control on the transformations applied to the objects.
We introduce a predictor architecture based on hypernetworks to learn equivariant representations with no possible collapse to invariance.
We introduce SIE (Split Invariant-Equivariant) which combines the hypernetwork-based predictor with representations split in two parts, one invariant, the other equivariant, to learn richer representations.
arXiv Detail & Related papers (2023-02-14T07:53:18Z) - 3D Equivariant Graph Implicit Functions [51.5559264447605]
We introduce a novel family of graph implicit functions with equivariant layers that facilitates modeling fine local details.
Our method improves over the existing rotation-equivariant implicit function from 0.69 to 0.89 on the ShapeNet reconstruction task.
arXiv Detail & Related papers (2022-03-31T16:51:25Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - Equivariant Point Network for 3D Point Cloud Analysis [17.689949017410836]
We propose an effective and practical SE(3) (3D translation and rotation) equivariant network for point cloud analysis.
First, we present SE(3) separable point convolution, a novel framework that breaks down the 6D convolution into two separable convolutional operators.
Second, we introduce an attention layer to effectively harness the expressiveness of the equivariant features.
arXiv Detail & Related papers (2021-03-25T21:57:10Z) - Rotation-Invariant Point Convolution With Multiple Equivariant
Alignments [1.0152838128195467]
We show that using rotation-equivariant alignments, it is possible to make any convolutional layer rotation-invariant.
With this core layer, we design rotation-invariant architectures which improve state-of-the-art results in both object classification and semantic segmentation.
arXiv Detail & Related papers (2020-12-07T20:47:46Z) - Spherical Feature Transform for Deep Metric Learning [58.35971328774927]
This work proposes a novel spherical feature transform approach.
It relaxes the assumption of identical covariance between classes to an assumption of similar covariances of different classes on a hypersphere.
We provide a simple and effective training method, and in depth analysis on the relation between the two different transforms.
arXiv Detail & Related papers (2020-08-04T11:32:23Z) - SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks [71.55002934935473]
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations.
We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input.
arXiv Detail & Related papers (2020-06-18T13:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.