HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment
- URL: http://arxiv.org/abs/2404.00299v2
- Date: Tue, 2 Apr 2024 12:34:09 GMT
- Title: HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment
- Authors: Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang,
- Abstract summary: HOI-M3 is a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects.
It provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs.
- Score: 43.6454394625555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects, due to fundamental data scarcity. In this paper, we introduce HOI-M3, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably, it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M3 dataset, we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M3 dataset, corresponding codes, and pre-trained models will be disseminated to the community for future research.
Related papers
- Learning to Generate Human-Human-Object Interactions from Textual Descriptions [15.38195247862565]
We present a novel research problem to model the correlations between two people engaged in a shared interaction involving an object.<n>We refer to this formulation as Human-Human-Object Interactions (HHOIs)<n>We present a newly captured HHOIs dataset and a method to synthesize HHOI data by leveraging image generative models.
arXiv Detail & Related papers (2025-11-25T16:17:23Z) - MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions [20.96140289964853]
We present MMHOI -- a large-scale, Multi-human Multi-object Interaction dataset consisting of images from 12 everyday scenarios.<n> MMHOI offers complete 3D shape and pose annotations for every person and object, along with labels for 78 action categories and 14 interaction-specific body parts.<n>We present MMHOI-Net, an end-to-end transformer-based neural network for jointly estimating human-object 3D geometries, their interactions, and associated actions.
arXiv Detail & Related papers (2025-10-09T06:18:12Z) - InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation [54.09384502044162]
We introduce InterAct, a large-scale 3D HOI benchmark featuring dataset and methodological advancements.<n>First, we consolidate and standardize 21.81 hours of HOI data from diverse sources, enriching it with detailed textual annotations.<n>Second, we propose a unified optimization framework to enhance data quality by reducing artifacts and correcting hand motions.<n>Third, we define six benchmarking tasks and develop a unified HOI generative modeling perspective, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-09-11T15:43:54Z) - CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion [62.93198247045824]
3D human-object interaction (HOI) anticipation aims to predict the future motion of humans and their manipulated objects, conditioned on the historical context.<n>We propose a novel contact-consistent decoupled diffusion framework CoopDiff, which employs two distinct branches to decouple human and object motion modeling.
arXiv Detail & Related papers (2025-08-10T03:29:17Z) - HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects [86.86284624825356]
HIMO is a dataset of full-body human interacting with multiple objects.
HIMO contains 3.3K 4D HOI sequences and 4.08M 3D HOI frames.
arXiv Detail & Related papers (2024-07-17T07:47:34Z) - ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions [11.32229757116179]
We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment.
By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
arXiv Detail & Related papers (2024-01-18T18:59:58Z) - LEMON: Learning 3D Human-Object Interaction Relation from 2D Images [56.6123961391372]
Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling.
Most existing methods approach the goal by learning to predict isolated interaction elements.
We present LEMON, a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations.
arXiv Detail & Related papers (2023-12-14T14:10:57Z) - Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation [38.08445005326031]
We propose ProciGen to procedurally generate datasets with both, plausible interaction and diverse object variation.
We generate 1M+ human-object interaction pairs in 3D and leverage this large-scale data to train our HDM (Procedural Diffusion Model)
Our HDM is an image-conditioned diffusion model that learns both realistic interaction and highly accurate human and object shapes.
arXiv Detail & Related papers (2023-12-12T08:32:55Z) - I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions [42.87514729260336]
I'm-HOI is a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting.
It combines general motion inference and category-aware refinement.
Our dataset and code will be released to the community.
arXiv Detail & Related papers (2023-12-10T08:25:41Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.