Related papers: MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

URL: http://arxiv.org/abs/2101.06605v1
Date: Sun, 17 Jan 2021 06:36:28 GMT
Title: MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
Authors: Jiahui Huang, He Wang, Tolga Birdal, Minhyuk Sung, Federica Arrigoni, Shi-Min Hu, Leonidas Guibas
Abstract summary: We present a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting are. guaranteeing correspondence and segmentation consistency across multiple input point clouds and. obtaining robust motion-based rigid body segmentation applicable to novel object categories.
Score: 61.015704878681795
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motion-based rigid body segmentation applicable to novel object categories. We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds.

Related papers

On Moving Object Segmentation from Monocular Video with Transformers [3.683202928838613]
We present a novel fusion architecture for monocular motion segmentation - M3Former. We analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance.
arXiv Detail & Related papers (2024-11-28T13:42:35Z)
3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding. This paper introduces a novel approach to instance segmentation and tracking in first-person video. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z)
Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream. At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank. To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z)
OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds [4.709764624933227]
We propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass. We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation.
arXiv Detail & Related papers (2022-10-10T07:01:08Z)
A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion [13.88656793940129]
We propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. We then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion. In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset.
arXiv Detail & Related papers (2022-07-15T10:00:43Z)
Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation. We introduce a scalable pipeline for generating synthetic training data with multiple objects. We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z)
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query. Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information. We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud [19.46077164219437]
RPM-Net simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities.
arXiv Detail & Related papers (2020-06-26T08:51:11Z)
Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder. In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.