Inter-X: Towards Versatile Human-Human Interaction Analysis
- URL: http://arxiv.org/abs/2312.16051v1
- Date: Tue, 26 Dec 2023 13:36:05 GMT
- Title: Inter-X: Towards Versatile Human-Human Interaction Analysis
- Authors: Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu,
Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng,
Xiaokang Yang
- Abstract summary: We propose Inter-X, a dataset with accurate body movements and diverse interaction patterns.
The dataset includes 11K interaction sequences and more than 8.1M frames.
We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions.
- Score: 100.254438708001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The analysis of the ubiquitous human-human interactions is pivotal for
understanding humans as social beings. Existing human-human interaction
datasets typically suffer from inaccurate body motions, lack of hand gestures
and fine-grained textual descriptions. To better perceive and generate
human-human interactions, we propose Inter-X, a currently largest human-human
interaction dataset with accurate body movements and diverse interaction
patterns, together with detailed hand gestures. The dataset includes ~11K
interaction sequences and more than 8.1M frames. We also equip Inter-X with
versatile annotations of more than 34K fine-grained human part-level textual
descriptions, semantic interaction categories, interaction order, and the
relationship and personality of the subjects. Based on the elaborate
annotations, we propose a unified benchmark composed of 4 categories of
downstream tasks from both the perceptual and generative directions. Extensive
experiments and comprehensive analysis show that Inter-X serves as a testbed
for promoting the development of versatile human-human interaction analysis.
Our dataset and benchmark will be publicly available for research purposes.
Related papers
- in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions.
We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Expressive Forecasting of 3D Whole-body Human Motions [38.93700642077312]
We are the first to formulate a whole-body human pose forecasting framework.
Our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI)
We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-theart performance.
arXiv Detail & Related papers (2023-12-19T09:09:46Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Generating Human-Centric Visual Cues for Human-Object Interaction
Detection via Large Vision-Language Models [59.611697856666304]
Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions.
We propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans.
We develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O)
In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction.
We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.