RoSI: Recovering 3D Shape Interiors from Few Articulation Images
- URL: http://arxiv.org/abs/2304.06342v1
- Date: Thu, 13 Apr 2023 08:45:26 GMT
- Title: RoSI: Recovering 3D Shape Interiors from Few Articulation Images
- Authors: Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric
Bennett, Hao Zhang
- Abstract summary: We present a learning framework to recover the shape interiors of existing 3D models with only their exteriors from multi-view and multi-articulation images.
Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase.
In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object.
- Score: 20.430308190444737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant majority of 3D models that appear in gaming, VR/AR, and those we
use to train geometric deep learning algorithms are incomplete, since they are
modeled as surface meshes and missing their interior structures. We present a
learning framework to recover the shape interiors (RoSI) of existing 3D models
with only their exteriors from multi-view and multi-articulation images. Given
a set of RGB images that capture a target 3D object in different articulated
poses, possibly from only few views, our method infers the interior planes that
are observable in the input images. Our neural architecture is trained in a
category-agnostic manner and it consists of a motion-aware multi-view analysis
phase including pose, depth, and motion estimations, followed by interior plane
detection in images and 3D space, and finally multi-view plane fusion. In
addition, our method also predicts part articulations and is able to realize
and even extrapolate the captured motions on the target 3D object. We evaluate
our method by quantitative and qualitative comparisons to baselines and
alternative solutions, as well as testing on untrained object categories and
real image inputs to assess its generalization capabilities.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Structured 3D Features for Reconstructing Controllable Avatars [43.36074729431982]
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface.
We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation.
arXiv Detail & Related papers (2022-12-13T18:57:33Z) - Towards High-Fidelity Single-view Holistic Reconstruction of Indoor
Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images.
We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction.
Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z) - Learning Ego 3D Representation as Ray Tracing [42.400505280851114]
We present a novel end-to-end architecture for ego 3D representation learning from unconstrained camera views.
Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation.
We show that our model outperforms all state-of-the-art alternatives significantly.
arXiv Detail & Related papers (2022-06-08T17:55:50Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.