TeHOR: Text-Guided 3D Human and Object Reconstruction with Textures
- URL: http://arxiv.org/abs/2602.19679v1
- Date: Mon, 23 Feb 2026 10:22:52 GMT
- Title: TeHOR: Text-Guided 3D Human and Object Reconstruction with Textures
- Authors: Hyeongjin Nam, Daniel Sungho Jung, Kyoung Mu Lee,
- Abstract summary: Joint reconstruction of 3D human and object from a single image is an active research area, with pivotal applications in robotics and digital content creation.<n>Existing approaches rely heavily on physical contact information, which inherently cannot capture non-contact human-object interactions.<n>We introduce TeHOR, a framework built upon two core designs. First, beyond contact information, our framework leverages text descriptions of human-object interactions to enforce semantic alignment.<n>Second, we incorporate appearance cues of the 3D human and object into the alignment process to capture holistic contextual information.
- Score: 53.21603129469796
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Joint reconstruction of 3D human and object from a single image is an active research area, with pivotal applications in robotics and digital content creation. Despite recent advances, existing approaches suffer from two fundamental limitations. First, their reconstructions rely heavily on physical contact information, which inherently cannot capture non-contact human-object interactions, such as gazing at or pointing toward an object. Second, the reconstruction process is primarily driven by local geometric proximity, neglecting the human and object appearances that provide global context crucial for understanding holistic interactions. To address these issues, we introduce TeHOR, a framework built upon two core designs. First, beyond contact information, our framework leverages text descriptions of human-object interactions to enforce semantic alignment between the 3D reconstruction and its textual cues, enabling reasoning over a wider spectrum of interactions, including non-contact cases. Second, we incorporate appearance cues of the 3D human and object into the alignment process to capture holistic contextual information, thereby ensuring visually plausible reconstructions. As a result, our framework produces accurate and semantically coherent reconstructions, achieving state-of-the-art performance.
Related papers
- ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors [51.06020148149403]
We introduce ArtHOI, the first zero-shot framework for articulated human-object interaction synthesis via 4D reconstruction from video priors.<n>ArtHOI bridges video-based generation and geometry-aware reconstruction, producing interactions that are both semantically aligned and physically grounded.
arXiv Detail & Related papers (2026-03-04T17:58:04Z) - Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints [20.702086497025494]
In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals.<n>First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry.<n>Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry.
arXiv Detail & Related papers (2025-12-04T18:45:14Z) - Realistic Clothed Human and Object Joint Reconstruction from a Single Image [26.57698106821237]
We introduce a novel implicit approach for jointly reconstructing realistic 3D clothed humans and objects from a monocular view.<n>For the first time, we model both the human and the object with an implicit representation, allowing to capture more realistic details such as clothing.
arXiv Detail & Related papers (2025-02-25T12:26:36Z) - Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects [67.96148051569993]
This paper introduces a new neuro-implicit method that can reconstruct the geometry and appearance of two objects undergoing close interactions while disjoining both in 3D.<n>The framework is end-to-end trainable and supervised using a novel alpha-blending regularisation.<n>We introduce a new dataset consisting of close interactions between a human and an object and also evaluate on two scenes of humans performing martial arts.
arXiv Detail & Related papers (2025-02-19T18:59:56Z) - Single-image coherent reconstruction of objects and humans [16.836684199314938]
Existing methods for reconstructing objects and humans from a monocular image suffer from severe mesh collisions and performance limitations.
This paper introduces a method to obtain a globally consistent 3D reconstruction of interacting objects and people from a single image.
arXiv Detail & Related papers (2024-08-15T11:27:18Z) - Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer [58.98785899556135]
We present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects.
There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement.
arXiv Detail & Related papers (2024-04-07T06:01:49Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.