FOF: Learning Fourier Occupancy Field for Monocular Real-time Human
Reconstruction
- URL: http://arxiv.org/abs/2206.02194v2
- Date: Mon, 4 Sep 2023 14:14:21 GMT
- Title: FOF: Learning Fourier Occupancy Field for Monocular Real-time Human
Reconstruction
- Authors: Qiao Feng, Yebin Liu, Yu-Kun Lai, Jingyu Yang, Kun Li
- Abstract summary: Existing representations, such as parametric models, voxel grids, meshes and implicit neural representations, have difficulties achieving high-quality results and real-time speed at the same time.
We propose Fourier Occupancy Field (FOF), a novel powerful, efficient and flexible 3D representation, for monocular real-time and accurate human reconstruction.
A FOF can be stored as a multi-channel image, which is compatible with 2D convolutional neural networks and can bridge the gap between 3D and 2D images.
- Score: 73.85709132666626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of deep learning has led to significant progress in monocular
human reconstruction. However, existing representations, such as parametric
models, voxel grids, meshes and implicit neural representations, have
difficulties achieving high-quality results and real-time speed at the same
time. In this paper, we propose Fourier Occupancy Field (FOF), a novel
powerful, efficient and flexible 3D representation, for monocular real-time and
accurate human reconstruction. The FOF represents a 3D object with a 2D field
orthogonal to the view direction where at each 2D position the occupancy field
of the object along the view direction is compactly represented with the first
few terms of Fourier series, which retains the topology and neighborhood
relation in the 2D domain. A FOF can be stored as a multi-channel image, which
is compatible with 2D convolutional neural networks and can bridge the gap
between 3D geometries and 2D images. The FOF is very flexible and extensible,
e.g., parametric models can be easily integrated into a FOF as a prior to
generate more robust results. Based on FOF, we design the first 30+FPS
high-fidelity real-time monocular human reconstruction framework. We
demonstrate the potential of FOF on both public dataset and real captured data.
The code will be released for research purposes.
Related papers
- Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation.
To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views.
Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation [42.83810819513537]
Single image-to-3D generation is pivotal for crafting controllable 3D assets.
We propose a 2D-3D hybrid Fourier Score Distillation objective function, hy-FSD.
hy-FSD can be integrated into existing 3D generation methods and produce significant performance gains.
arXiv Detail & Related papers (2024-05-31T08:11:25Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion [51.31220416754788]
We present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally.
Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation.
arXiv Detail & Related papers (2024-04-10T09:24:54Z) - 2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images [12.076881343401329]
We present a novel two-stage algorithm, 2S-UDF, for learning a high-quality UDF from multi-view images.
In both quantitative metrics and visual quality, the results indicate our superior performance over other UDF learning techniques.
arXiv Detail & Related papers (2023-03-27T16:35:28Z) - RAFaRe: Learning Robust and Accurate Non-parametric 3D Face
Reconstruction from Pseudo 2D&3D Pairs [13.11105614044699]
We propose a robust and accurate non-parametric method for single-view 3D face reconstruction (SVFR)
A large-scale pseudo 2D&3D dataset is created by first rendering the detailed 3D faces, then swapping the face in the wild images with the rendered face.
Our model outperforms previous methods on FaceScape-wild/lab and MICC benchmarks.
arXiv Detail & Related papers (2023-02-10T19:40:26Z) - DiffusionSDF: Conditional Generative Modeling of Signed Distance
Functions [42.015077094731815]
DiffusionSDF is a generative model for shape completion, single-view reconstruction, and reconstruction of real-scanned point clouds.
We use neural signed distance functions (SDFs) as our 3D representation to parameterize the geometry of various signals (e.g., point clouds, 2D images) through neural networks.
arXiv Detail & Related papers (2022-11-24T18:59:01Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D
Shapes [77.6741486264257]
We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs.
We show that our representation is 2-3 orders of magnitude more efficient in terms of rendering speed compared to previous works.
arXiv Detail & Related papers (2021-01-26T18:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.