RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud
- URL: http://arxiv.org/abs/2006.14865v1
- Date: Fri, 26 Jun 2020 08:51:11 GMT
- Title: RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud
- Authors: Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver van Kaick,
Hao Zhang, Hui Huang
- Abstract summary: RPM-Net simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape.
We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities.
- Score: 19.46077164219437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce RPM-Net, a deep learning-based approach which simultaneously
infers movable parts and hallucinates their motions from a single,
un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel
Recurrent Neural Network (RNN), composed of an encoder-decoder pair with
interleaved Long Short-Term Memory (LSTM) components, which together predict a
temporal sequence of pointwise displacements for the input point cloud. At the
same time, the displacements allow the network to learn movable parts,
resulting in a motion-based shape segmentation. Recursive applications of
RPM-Net on the obtained parts can predict finer-level part motions, resulting
in a hierarchical object segmentation. Furthermore, we develop a separate
network to estimate part mobilities, e.g., per-part motion parameters, from the
segmented motion sequence. Both networks learn deep predictive models from a
training set that exemplifies a variety of mobilities for diverse objects. We
show results of simultaneous motion and part predictions from synthetic and
real scans of 3D objects exhibiting a variety of part mobilities, possibly
involving multiple movable parts.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity
Recognition [2.7677069267434873]
Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community.
We propose a novel Sparse Weighted Temporal Fusion (SWTF) module to utilize sparsely sampled video frames.
The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets.
arXiv Detail & Related papers (2022-11-10T12:45:43Z) - Efficient Unsupervised Video Object Segmentation Network Based on Motion
Guidance [1.5736899098702974]
This paper proposes a video object segmentation network based on motion guidance.
The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module.
The experimental results prove the superior performance of the proposed method.
arXiv Detail & Related papers (2022-11-10T06:13:23Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan
Synchronization [61.015704878681795]
We present a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for 3D point clouds.
The two non-trivial challenges posed by this multi-scan multibody setting are.
guaranteeing correspondence and segmentation consistency across multiple input point clouds and.
obtaining robust motion-based rigid body segmentation applicable to novel object categories.
arXiv Detail & Related papers (2021-01-17T06:36:28Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.