Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects
- URL: http://arxiv.org/abs/2502.13968v1
- Date: Wed, 19 Feb 2025 18:59:56 GMT
- Title: Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects
- Authors: Suhas Gopal, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt,
- Abstract summary: This paper introduces a new neuro-implicit method that can reconstruct the geometry and appearance of two objects undergoing close interactions while disjoining both in 3D.
The framework is end-to-end trainable and supervised using a novel alpha-blending regularisation.
We introduce a new dataset consisting of close interactions between a human and an object and also evaluate on two scenes of humans performing martial arts.
- Score: 67.96148051569993
- License:
- Abstract: Separable 3D reconstruction of multiple objects from multi-view RGB images -- resulting in two different 3D shapes for the two objects with a clear separation between them -- remains a sparsely researched problem. It is challenging due to severe mutual occlusions and ambiguities along the objects' interaction boundaries. This paper investigates the setting and introduces a new neuro-implicit method that can reconstruct the geometry and appearance of two objects undergoing close interactions while disjoining both in 3D, avoiding surface inter-penetrations and enabling novel-view synthesis of the observed scene. The framework is end-to-end trainable and supervised using a novel alpha-blending regularisation that ensures that the two geometries are well separated even under extreme occlusions. Our reconstruction method is markerless and can be applied to rigid as well as articulated objects. We introduce a new dataset consisting of close interactions between a human and an object and also evaluate on two scenes of humans performing martial arts. The experiments confirm the effectiveness of our framework and substantial improvements using 3D and novel view synthesis metrics compared to several existing approaches applicable in our setting.
Related papers
- Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics [31.819336585007104]
We propose to leverage superquadrics as an alternative 3D object representation to bounding boxes.
We demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks.
We also study the compositionality of actions by considering a more challenging task where the training combinations of verbs and nouns do not overlap with the testing split.
arXiv Detail & Related papers (2025-01-13T07:26:05Z) - GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding [53.42728468191711]
Open-Vocabulary 3D object affordance grounding aims to anticipate action possibilities'' regions on 3D objects with arbitrary instructions.
We propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding.
arXiv Detail & Related papers (2024-11-29T11:23:15Z) - Single-image coherent reconstruction of objects and humans [16.836684199314938]
Existing methods for reconstructing objects and humans from a monocular image suffer from severe mesh collisions and performance limitations.
This paper introduces a method to obtain a globally consistent 3D reconstruction of interacting objects and people from a single image.
arXiv Detail & Related papers (2024-08-15T11:27:18Z) - SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images
for Articulated Objects [24.737865259695006]
We propose a self-supervised interaction perception method, referred to as SM$3$, to model articulated objects.
By constructing 3D geometries and textures from the captured 2D images, SM$3$ achieves integrated optimization of movable part and joint parameters.
Evaluations demonstrate that SM$3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.
arXiv Detail & Related papers (2024-01-17T11:15:09Z) - JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery.
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.