Learning Implicit 3D Representations of Dressed Humans from Sparse Views
- URL: http://arxiv.org/abs/2104.08013v1
- Date: Fri, 16 Apr 2021 10:20:26 GMT
- Title: Learning Implicit 3D Representations of Dressed Humans from Sparse Views
- Authors: Pierre Zins, Yuanlu Xu, Edmond Boyer, Stefanie Wuhrer, Tony Tung
- Abstract summary: We propose an end-to-end approach that learns an implicit 3D representation of dressed humans from sparse camera views.
In the experiments, we show the proposed approach outperforms the state of the art on standard data both quantitatively and qualitatively.
- Score: 31.584157304372425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, data-driven single-view reconstruction methods have shown great
progress in modeling 3D dressed humans. However, such methods suffer heavily
from depth ambiguities and occlusions inherent to single view inputs. In this
paper, we address such issues by lifting the single-view input with additional
views and investigate the best strategy to suitably exploit information from
multiple views. We propose an end-to-end approach that learns an implicit 3D
representation of dressed humans from sparse camera views. Specifically, we
introduce two key components: first an attention-based fusion layer that learns
to aggregate visual information from several viewpoints; second a mechanism
that encodes local 3D patterns under the multi-view context. In the
experiments, we show the proposed approach outperforms the state of the art on
standard data both quantitatively and qualitatively. Additionally, we apply our
method on real data acquired with a multi-camera platform and demonstrate our
approach can obtain results comparable to multi-view stereo with dramatically
less views.
Related papers
- MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - Enhancing Egocentric 3D Pose Estimation with Third Person Views [37.9683439632693]
We propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera.
We introduce First2Third-Pose, a new paired synchronized dataset of nearly 2,000 videos depicting human activities captured from both first- and third-view perspectives.
Experimental results demonstrate that the joint multi-view embedded space learned with our dataset is useful to extract discriminatory features from arbitrary single-view egocentric videos.
arXiv Detail & Related papers (2022-01-06T11:42:01Z) - Learning Object-Centric Representations of Multi-Object Scenes from
Multiple Views [9.556376932449187]
Multi-View and Multi-Object Network (MulMON) is a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.
We show that MulMON better-resolves spatial ambiguities than single-view methods.
arXiv Detail & Related papers (2021-11-13T13:54:28Z) - Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations [61.870882736758624]
We propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER)
Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via projection.
Then, we self-train a representation to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after the transformation.
arXiv Detail & Related papers (2021-03-01T06:24:17Z) - Generalized Multi-view Shared Subspace Learning using View Bootstrapping [43.027427742165095]
Key objective in multi-view learning is to model the information common to multiple parallel views of a class of objects/events to improve downstream learning tasks.
We present a neural method based on multi-view correlation to capture the information shared across a large number of views by subsampling them in a view-agnostic manner during training.
Experiments on spoken word recognition, 3D object classification and pose-invariant face recognition demonstrate the robustness of view bootstrapping to model a large number of views.
arXiv Detail & Related papers (2020-05-12T20:35:14Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.