Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D
Images
- URL: http://arxiv.org/abs/2303.10896v1
- Date: Mon, 20 Mar 2023 06:32:55 GMT
- Title: Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D
Images
- Authors: Chang Yu, Xiangyu Zhu, Xiaomei Zhang, Zhaoxiang Zhang, Zhen Lei
- Abstract summary: We propose an Inverse Graphics Capsule Network (IGC-Net) to learn the hierarchical 3D face representations from large-scale unlabeled images.
IGC-Net first decomposes the objects into a set of semantic-consistent part-level descriptions and then assembles them into object-level descriptions to build the hierarchy.
- Score: 82.5266467869448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The function of constructing the hierarchy of objects is important to the
visual process of the human brain. Previous studies have successfully adopted
capsule networks to decompose the digits and faces into parts in an
unsupervised manner to investigate the similar perception mechanism of neural
networks. However, their descriptions are restricted to the 2D space, limiting
their capacities to imitate the intrinsic 3D perception ability of humans. In
this paper, we propose an Inverse Graphics Capsule Network (IGC-Net) to learn
the hierarchical 3D face representations from large-scale unlabeled images. The
core of IGC-Net is a new type of capsule, named graphics capsule, which
represents 3D primitives with interpretable parameters in computer graphics
(CG), including depth, albedo, and 3D pose. Specifically, IGC-Net first
decomposes the objects into a set of semantic-consistent part-level
descriptions and then assembles them into object-level descriptions to build
the hierarchy. The learned graphics capsules reveal how the neural networks,
oriented at visual perception, understand faces as a hierarchy of 3D models.
Besides, the discovered parts can be deployed to the unsupervised face
segmentation task to evaluate the semantic consistency of our method. Moreover,
the part-level descriptions with explicit physical meanings provide insight
into the face analysis that originally runs in a black box, such as the
importance of shape and texture for face recognition. Experiments on CelebA,
BP4D, and Multi-PIE demonstrate the characteristics of our IGC-Net.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Learning 3D object-centric representation through prediction [12.008668555280668]
We develop a novel network architecture that learns to 1) segment objects from discrete images, 2) infer their 3D locations, and 3) perceive depth.
The core idea is treating objects as latent causes of visual input which the brain uses to make efficient predictions of future scenes.
arXiv Detail & Related papers (2024-03-06T14:19:11Z) - Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
We introduce a simple framework that operates on 3D points of single objects or whole scenes.
Our model, Multiview Compressive Coding, learns to compress the input appearance and geometry to predict the 3D structure.
arXiv Detail & Related papers (2023-01-19T18:59:52Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - 3D Concept Grounding on Neural Fields [99.33215488324238]
Existing visual reasoning approaches typically utilize supervised methods to extract 2D segmentation masks on which concepts are grounded.
Humans are capable of grounding concepts on the underlying 3D representation of images.
We propose to leverage the continuous, differentiable nature of neural fields to segment and learn concepts.
arXiv Detail & Related papers (2022-07-13T17:59:33Z) - Disentangling 3D Prototypical Networks For Few-Shot Concept Learning [29.02523358573336]
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene.
Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay.
arXiv Detail & Related papers (2020-11-06T14:08:27Z) - Learning to Reconstruct and Segment 3D Objects [4.709764624933227]
We aim to understand scenes and the objects within them by learning general and robust representations using deep neural networks.
This thesis makes three core contributions from object-level 3D shape estimation from single or multiple views to scene-level semantic understanding.
arXiv Detail & Related papers (2020-10-19T15:09:04Z) - GRF: Learning a General Radiance Field for 3D Representation and
Rendering [4.709764624933227]
We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations.
The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input.
Our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
arXiv Detail & Related papers (2020-10-09T14:21:43Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.