Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from
Sparse Image Ensemble
- URL: http://arxiv.org/abs/2212.11042v4
- Date: Sat, 25 Mar 2023 17:59:59 GMT
- Title: Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from
Sparse Image Ensemble
- Authors: Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein,
Ming-Hsuan Yang, Varun Jampani
- Abstract summary: Hi-LASSIE performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates.
First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image.
Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance.
- Score: 72.3681707384754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically estimating 3D skeleton, shape, camera viewpoints, and part
articulation from sparse in-the-wild image ensembles is a severely
under-constrained and challenging problem. Most prior methods rely on
large-scale image datasets, dense temporal correspondence, or human annotations
like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE,
which performs 3D articulated reconstruction from only 20-30 online images in
the wild without any user-defined shape or skeleton templates. We follow the
recent work of LASSIE that tackles a similar problem setting and make two
significant advances. First, instead of relying on a manually annotated 3D
skeleton, we automatically estimate a class-specific skeleton from the selected
reference image. Second, we improve the shape reconstructions with novel
instance-specific optimization strategies that allow reconstructions to
faithful fit on each instance while preserving the class-specific priors
learned across all images. Experiments on in-the-wild image ensembles show that
Hi-LASSIE obtains higher fidelity state-of-the-art 3D reconstructions despite
requiring minimum user input.
Related papers
- The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets [34.610546020800236]
3DMiner is a pipeline for mining 3D shapes from challenging datasets.
Our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques.
We show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset.
arXiv Detail & Related papers (2023-10-29T23:08:19Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization [30.951405623906258]
Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world.
We propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass.
arXiv Detail & Related papers (2023-06-29T13:28:16Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - SAOR: Single-View Articulated Object Reconstruction [17.2716639564414]
We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild.
Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors.
arXiv Detail & Related papers (2023-03-23T17:59:35Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies
from Single RGB Images [5.775625085664381]
We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime.
Key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously.
We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-22T04:26:11Z) - SparseFusion: Dynamic Human Avatar Modeling from Sparse RGBD Images [49.52782544649703]
We propose a novel approach to reconstruct 3D human body shapes based on a sparse set of RGBD frames.
The main challenge is how to robustly fuse these sparse frames into a canonical 3D model.
Our framework is flexible, with potential applications going beyond shape reconstruction.
arXiv Detail & Related papers (2020-06-05T18:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.