Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation
- URL: http://arxiv.org/abs/2306.05584v2
- Date: Tue, 31 Oct 2023 13:46:52 GMT
- Title: Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation
- Authors: Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew
Markham, Niki Trigoni
- Abstract summary: We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
- Score: 49.56131393810713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A truly generalizable approach to rigid segmentation and motion estimation is
fundamental to 3D understanding of articulated objects and moving scenes. In
view of the closely intertwined relationship between segmentation and motion
estimates, we present an SE(3) equivariant architecture and a training strategy
to tackle this task in an unsupervised manner. Our architecture is composed of
two interconnected, lightweight heads. These heads predict segmentation masks
using point-level invariant features and estimate motion from SE(3) equivariant
features, all without the need for category information. Our training strategy
is unified and can be implemented online, which jointly optimizes the predicted
segmentation and motion by leveraging the interrelationships among scene flow,
segmentation mask, and rigid transformations. We conduct experiments on four
datasets to demonstrate the superiority of our method. The results show that
our method excels in both model performance and computational efficiency, with
only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is
the first work designed for category-agnostic part-level SE(3) equivariance in
dynamic point clouds.
Related papers
- S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - PointInst3D: Segmenting 3D Instances by Points [136.7261709896713]
We propose a fully-convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion.
We find the key to its success is assigning a suitable target to each sampled point.
Our approach achieves promising results on both ScanNet and S3DIS benchmarks.
arXiv Detail & Related papers (2022-04-25T02:41:46Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - 3DCFS: Fast and Robust Joint 3D Semantic-Instance Segmentation via
Coupled Feature Selection [46.922236354885]
We propose a novel 3D point clouds segmentation framework, named 3DCFS, that jointly performs semantic and instance segmentation.
Inspired by the human scene perception process, we design a novel coupled feature selection module, named CFSM, that adaptively selects and fuses the reciprocal semantic and instance features.
Our 3DCFS outperforms state-of-the-art methods on benchmark datasets in terms of accuracy, speed and computational cost.
arXiv Detail & Related papers (2020-03-01T17:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.