Related papers: CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

URL: http://arxiv.org/abs/2406.19353v1
Date: Thu, 27 Jun 2024 17:32:18 GMT
Title: CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Authors: Chengwen Zhang, Yun Liu, Ruofan Xing, Bingda Tang, Li Yi,
Abstract summary: We present CORE4D, a novel large-scale 4D human-object collaborative object rearrangement. With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration strategy to augment motions to a variety of novel objects. Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis.
Score: 20.520938266527438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how humans cooperatively rearrange household objects is critical for VR/AR and human-robot interaction. However, in-depth studies on modeling these behaviors are under-researched due to the lack of relevant datasets. We fill this gap by presenting CORE4D, a novel large-scale 4D human-object-human interaction dataset focusing on collaborative object rearrangement, which encompasses diverse compositions of various object geometries, collaboration modes, and 3D scenes. With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration retargeting strategy to augment motions to a variety of novel objects. Leveraging this approach, CORE4D comprises a total of 11K collaboration sequences spanning 3K real and virtual object shapes. Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis. Extensive experiments demonstrate the effectiveness of our collaboration retargeting strategy and indicate that CORE4D has posed new challenges to existing human-object interaction generation methodologies. Our dataset and code are available at https://github.com/leolyliu/CORE4D-Instructions.

Related papers

CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction [40.557276644446475]
We present CARI4D, the first category-agnostic method that reconstructs spatially and temporarily consistent 4D human-object interaction at metric scale from monocular RGB videos.<n>Our model generalizes beyond the training categories and thus can be applied zero-shot to in-the-wild internet videos.
arXiv Detail & Related papers (2025-12-12T19:11:11Z)
Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction [19.16200327159635]
Generalized robots must learn from diverse, large-scale humanobject interactions (HOI) to robustly operate in the real world.<n>We introduce 4DHOISolver, a novel and efficient optimization framework that constrains ill-posed 4D HOI reconstruction problem.<n>We introduce Open4DHOI, a new large-scale 4D HOI dataset featuring a diverse catalog of 144 object types and 103 actions.
arXiv Detail & Related papers (2025-11-30T16:21:47Z)
HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation [51.27178551863772]
We propose a unified framework that enables the joint modeling of appearance and human-part semantics from a single image.<n>HumanCrafter surpasses existing state-of-the-art methods in both 3D human-part segmentation and 3D human reconstruction from a single image.
arXiv Detail & Related papers (2025-11-01T09:29:36Z)
MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions [20.96140289964853]
We present MMHOI -- a large-scale, Multi-human Multi-object Interaction dataset consisting of images from 12 everyday scenarios.<n> MMHOI offers complete 3D shape and pose annotations for every person and object, along with labels for 78 action categories and 14 interaction-specific body parts.<n>We present MMHOI-Net, an end-to-end transformer-based neural network for jointly estimating human-object 3D geometries, their interactions, and associated actions.
arXiv Detail & Related papers (2025-10-09T06:18:12Z)
CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion [62.93198247045824]
3D human-object interaction (HOI) anticipation aims to predict the future motion of humans and their manipulated objects, conditioned on the historical context.<n>We propose a novel contact-consistent decoupled diffusion framework CoopDiff, which employs two distinct branches to decouple human and object motion modeling.
arXiv Detail & Related papers (2025-08-10T03:29:17Z)
GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects [13.830968058014546]
GenHOI is a two-stage framework aimed at achieving two key objectives: 1) generalization to unseen objects and 2) the synthesis of high-fidelity 4D HOI sequences.<n>We introduce a Contact-Aware Diffusion Model (ContactDM) in the second stage to seamlessly interpolate 3D HOIs into densely temporally coherent 4D HOI sequences.<n> Experimental results show that we achieve state-of-the-art results on the publicly available OMOMODM and 3D-FUTURE datasets.
arXiv Detail & Related papers (2025-06-18T14:17:53Z)
DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models [9.103840202072336]
We present a novel framework for learning Dynamic Affordance across various target object categories. To address the scarcity of 4D HOI datasets, our method learns the 3D dynamic affordance from synthetically generated 4D HOI samples. We demonstrate that DAViD, our generative 4D human-object interaction model, outperforms baselines in HOI motion.
arXiv Detail & Related papers (2025-01-14T18:59:59Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions [11.32229757116179]
We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment. By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
arXiv Detail & Related papers (2024-01-18T18:59:58Z)
HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions. Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z)
Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images. Our method extracts high-level commonsense knowledge from large language models. We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z)
Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications. We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded. Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z)
BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them. We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z)
H4D: Human 4D Modeling by Learning Neural Compositional Representation [75.34798886466311]
This work presents a novel framework that can effectively learn a compact and compositional representation for dynamic human. A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation. Experiments demonstrate our method is not only efficacy in recovering dynamic human with accurate motion and detailed geometry, but also amenable to various 4D human related tasks.
arXiv Detail & Related papers (2022-03-02T17:10:49Z)
Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video. Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z)
Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions [35.41116017268475]
4D reconstruction of human-object interaction is critical for immersive VR/AR experience and human activity understanding. Recent advances still fail to recover fine geometry and texture results from sparse RGB inputs, especially under challenging human-object interactions scenarios. We propose a neural human performance capture and rendering system to generate both high-quality geometry and photo-realistic texture of both human and objects.
arXiv Detail & Related papers (2021-08-01T04:53:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.