Related papers: SAOR: Single-View Articulated Object Reconstruction

SAOR: Single-View Articulated Object Reconstruction

URL: http://arxiv.org/abs/2303.13514v3
Date: Mon, 8 Apr 2024 11:22:05 GMT
Title: SAOR: Single-View Articulated Object Reconstruction
Authors: Mehmet Aygün, Oisin Mac Aodha,
Abstract summary: We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors.
Score: 17.2716639564414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. To prevent ill-posed solutions, we propose a cross-instance consistency loss that exploits disentangled object shape deformation and articulation. This is helped by a new silhouette-based sampling mechanism to enhance viewpoint diversity during training. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time, given a single-view image, it efficiently outputs an explicit mesh representation. We obtain improved qualitative and quantitative results on challenging quadruped animals compared to relevant existing work.

Related papers

Canonical Pose Reconstruction from Single Depth Image for 3D Non-rigid Pose Recovery on Limited Datasets [55.84702107871358]
3D reconstruction from 2D inputs, especially for non-rigid objects like humans, presents unique challenges.<n>Traditional methods often struggle with non-rigid shapes, which require extensive training data to cover the entire deformation space.<n>This study proposes a canonical pose reconstruction model that transforms single-view depth images of deformable shapes into a canonical form.
arXiv Detail & Related papers (2025-05-23T14:58:34Z)
A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction [1.2289361708127877]
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. We propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.
arXiv Detail & Related papers (2023-08-17T06:48:55Z)
3D Surface Reconstruction in the Wild by Deforming Shape Priors from Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem. We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image. Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z)
MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z)
Multi-Category Mesh Reconstruction From Image Collections [90.24365811344987]
We present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
arXiv Detail & Related papers (2021-10-21T16:32:31Z)
A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z)
Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image. We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object. The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.