RobustFusion: Robust Volumetric Performance Reconstruction under
Human-object Interactions from Monocular RGBD Stream
- URL: http://arxiv.org/abs/2104.14837v1
- Date: Fri, 30 Apr 2021 08:41:45 GMT
- Title: RobustFusion: Robust Volumetric Performance Reconstruction under
Human-object Interactions from Monocular RGBD Stream
- Authors: Zhuo Su, Lan Xu, Dawei Zhong, Zhong Li, Fan Deng, Shuxue Quan and Lu
Fang
- Abstract summary: High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios.
Recent advances still fail to provide reliable performance reconstruction.
We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
- Score: 27.600873320989276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality 4D reconstruction of human performance with complex interactions
to various objects is essential in real-world scenarios, which enables numerous
immersive VR/AR applications. However, recent advances still fail to provide
reliable performance reconstruction, suffering from challenging interaction
patterns and severe occlusions, especially for the monocular setting. To fill
this gap, in this paper, we propose RobustFusion, a robust volumetric
performance reconstruction system for human-object interaction scenarios using
only a single RGBD sensor, which combines various data-driven visual and
interaction cues to handle the complex interaction patterns and severe
occlusions. We propose a semantic-aware scene decoupling scheme to model the
occlusions explicitly, with a segmentation refinement and robust object
tracking to prevent disentanglement uncertainty and maintain temporal
consistency. We further introduce a robust performance capture scheme with the
aid of various data-driven cues, which not only enables re-initialization
ability, but also models the complex human-object interaction patterns in a
data-driven manner. To this end, we introduce a spatial relation prior to
prevent implausible intersections, as well as data-driven interaction cues to
maintain natural motions, especially for those regions under severe
human-object occlusions. We also adopt an adaptive fusion scheme for temporally
coherent human-object reconstruction with occlusion analysis and human parsing
cue. Extensive experiments demonstrate the effectiveness of our approach to
achieve high-quality 4D human performance reconstruction under complex
human-object interactions whilst still maintaining the lightweight monocular
setting.
Related papers
- Ask, Pose, Unite: Scaling Data Acquisition for Close Interactions with Vision Language Models [5.541130887628606]
Social dynamics in close human interactions pose significant challenges for Human Mesh Estimation (HME)
We introduce a novel data generation method that utilizes Large Vision Language Models (LVLMs) to annotate contact maps which guide test-time optimization to produce paired image and pseudo-ground truth meshes.
This methodology not only alleviates the annotation burden but also enables the assembly of a comprehensive dataset specifically tailored for close interactions in HME.
arXiv Detail & Related papers (2024-10-01T01:14:24Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Ins-HOI: Instance Aware Human-Object Interactions Recovery [44.02128629239429]
We propose an end-to-end Instance-aware Human-Object Interactions recovery (Ins-HOI) framework.
Ins-HOI supports instance-level reconstruction and provides reasonable and realistic invisible contact surfaces.
We collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions.
arXiv Detail & Related papers (2023-12-15T09:30:47Z) - Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction
on Monocular RGB Video [104.69686024776396]
Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors.
Previous works only leverage information from a single RGB image without modeling their physically plausible relation.
In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction.
arXiv Detail & Related papers (2023-08-08T06:16:37Z) - Instant-NVR: Instant Neural Volumetric Rendering for Human-object
Interactions from Monocular RGBD Stream [14.844982083586306]
We propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera.
In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors.
We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching.
arXiv Detail & Related papers (2023-04-06T16:09:51Z) - Rearrange Indoor Scenes for Human-Robot Co-Activity [82.22847163761969]
We present an optimization-based framework for rearranging indoor furniture to accommodate human-robot co-activities better.
Our algorithm preserves the functional relations among furniture by integrating spatial and semantic co-occurrence extracted from SUNCG and ConceptNet.
Our experiments show that rearranged scenes provide an average of 14% more accessible space and 30% more objects to interact with.
arXiv Detail & Related papers (2023-03-10T03:03:32Z) - NeuralFusion: Neural Volumetric Rendering under Human-object
Interactions [46.70371238621842]
We propose a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors.
For geometry modeling, we propose a neural implicit inference scheme with non-rigid key-volume fusion.
We also introduce a layer-wise human-object texture rendering scheme, which combines volumetric and image-based rendering in both spatial and temporal domains.
arXiv Detail & Related papers (2022-02-25T17:10:07Z) - Neural Free-Viewpoint Performance Rendering under Complex Human-object
Interactions [35.41116017268475]
4D reconstruction of human-object interaction is critical for immersive VR/AR experience and human activity understanding.
Recent advances still fail to recover fine geometry and texture results from sparse RGB inputs, especially under challenging human-object interactions scenarios.
We propose a neural human performance capture and rendering system to generate both high-quality geometry and photo-realistic texture of both human and objects.
arXiv Detail & Related papers (2021-08-01T04:53:54Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.