Learning Compositional Representation for 4D Captures with Neural ODE
- URL: http://arxiv.org/abs/2103.08271v1
- Date: Mon, 15 Mar 2021 10:55:55 GMT
- Title: Learning Compositional Representation for 4D Captures with Neural ODE
- Authors: Boyan Jiang, Yinda Zhang, Xingkui Wei, Xiangyang Xue, Yanwei Fu
- Abstract summary: We introduce a compositional representation for 4D captures, that disentangles shape, initial state, and motion respectively.
To model the motion, a neural Ordinary Differential Equation (ODE) is trained to update the initial state conditioned on the learned motion code.
A decoder takes the shape code and the updated pose code to reconstruct 4D captures at each time stamp.
- Score: 72.56606274691033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning based representation has become the key to the success of many
computer vision systems. While many 3D representations have been proposed, it
is still an unaddressed problem for how to represent a dynamically changing 3D
object. In this paper, we introduce a compositional representation for 4D
captures, i.e. a deforming 3D object over a temporal span, that disentangles
shape, initial state, and motion respectively. Each component is represented by
a latent code via a trained encoder. To model the motion, a neural Ordinary
Differential Equation (ODE) is trained to update the initial state conditioned
on the learned motion code, and a decoder takes the shape code and the updated
pose code to reconstruct 4D captures at each time stamp. To this end, we
propose an Identity Exchange Training (IET) strategy to encourage the network
to learn effectively decoupling each component. Extensive experiments
demonstrate that the proposed method outperforms existing state-of-the-art deep
learning based methods on 4D reconstruction, and significantly improves on
various tasks, including motion transfer and completion.
Related papers
- Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
We introduce a simple framework that operates on 3D points of single objects or whole scenes.
Our model, Multiview Compressive Coding, learns to compress the input appearance and geometry to predict the 3D structure.
arXiv Detail & Related papers (2023-01-19T18:59:52Z) - LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human
Modeling [69.56581851211841]
We propose a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD.
Our key insight is to encourage the network to learn the latent codes of local part-level representation.
LoRD has strong capability for representing 4D human, and outperforms state-of-the-art methods on practical applications.
arXiv Detail & Related papers (2022-08-18T03:49:44Z) - H4D: Human 4D Modeling by Learning Neural Compositional Representation [75.34798886466311]
This work presents a novel framework that can effectively learn a compact and compositional representation for dynamic human.
A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation.
Experiments demonstrate our method is not only efficacy in recovering dynamic human with accurate motion and detailed geometry, but also amenable to various 4D human related tasks.
arXiv Detail & Related papers (2022-03-02T17:10:49Z) - 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D
Scene Understanding [22.896937940702642]
We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training.
We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments.
Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks.
arXiv Detail & Related papers (2021-12-06T13:09:07Z) - Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors
for Efficient and Robust 4D Reconstruction [43.60322886598972]
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.
We present a novel pipeline to learn a temporal evolution of the 3D human shape through capturing continuous transformation functions among cross-frame occupancy fields.
arXiv Detail & Related papers (2021-03-30T13:36:03Z) - Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal
Heatmaps [2.2079886535603084]
We propose a depth-aware descriptor that encodes pose and motion information in a unified representation for action classification in-the-wild.
The key component of our method is the Depth-Aware Pose Motion representation (DA-PoTion), a new video descriptor that encodes the 3D movement of semantic keypoints of the human body.
arXiv Detail & Related papers (2020-11-26T17:26:42Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - V4D:4D Convolutional Neural Networks for Video-level Representation
Learning [58.548331848942865]
Most 3D CNNs for video representation learning are clip-based, and thus do not consider video-temporal evolution of features.
We propose Video-level 4D Conal Neural Networks, or V4D, to model long-range representation with 4D convolutions.
V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
arXiv Detail & Related papers (2020-02-18T09:27:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.