Related papers: Inter-X: Towards Versatile Human-Human Interaction Analysis

Inter-X: Towards Versatile Human-Human Interaction Analysis

URL: http://arxiv.org/abs/2312.16051v1
Date: Tue, 26 Dec 2023 13:36:05 GMT
Title: Inter-X: Towards Versatile Human-Human Interaction Analysis
Authors: Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang
Abstract summary: We propose Inter-X, a dataset with accurate body movements and diverse interaction patterns. The dataset includes 11K interaction sequences and more than 8.1M frames. We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions.
Score: 100.254438708001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accurate body movements and diverse interaction patterns, together with detailed hand gestures. The dataset includes ~11K interaction sequences and more than 8.1M frames. We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects. Based on the elaborate annotations, we propose a unified benchmark composed of 4 categories of downstream tasks from both the perceptual and generative directions. Extensive experiments and comprehensive analysis show that Inter-X serves as a testbed for promoting the development of versatile human-human interaction analysis. Our dataset and benchmark will be publicly available for research purposes.

Related papers

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions [110.43343503158306]
This paper embeds the manual-assisted task into a vision-language-action framework, where the assistant provides services to the instructor following egocentric vision and commands.<n>Under this setting, we accomplish InterVLA, the first large-scale human-object-human interaction dataset with 11.4 hours and 1.2M frames of multimodal data.<n>We establish novel benchmarks on egocentric human motion estimation, interaction synthesis, and interaction prediction with comprehensive analysis.
arXiv Detail & Related papers (2025-08-06T17:46:23Z)
HUMOF: Human Motion Forecasting in Interactive Social Scenes [29.621970821619424]
Complex scenes present significant challenges for predicting human behaviour due to the abundance of interaction information.<n>We propose an effective method for human motion forecasting in interactive scenes.<n>Our method achieves state-of-the-art performance across four public datasets.
arXiv Detail & Related papers (2025-06-04T09:21:54Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions. We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z)
THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR) In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion. We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z)
Expressive Forecasting of 3D Whole-body Human Motions [38.93700642077312]
We are the first to formulate a whole-body human pose forecasting framework. Our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI) We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-theart performance.
arXiv Detail & Related papers (2023-12-19T09:09:46Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O) In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction. We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z)
DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.