Interaction Transformer for Human Reaction Generation
- URL: http://arxiv.org/abs/2207.01685v1
- Date: Mon, 4 Jul 2022 19:30:41 GMT
- Title: Interaction Transformer for Human Reaction Generation
- Authors: Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe
- Abstract summary: We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions.
Our method is general and can be used to generate more complex and long-term interactions.
- Score: 61.22481606720487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the challenging task of human reaction generation which aims to
generate a corresponding reaction based on an input action. Most of the
existing works do not focus on generating and predicting the reaction and
cannot generate the motion when only the action is given as input. To address
this limitation, we propose a novel interaction Transformer (InterFormer)
consisting of a Transformer network with both temporal and spatial attentions.
Specifically, the temporal attention captures the temporal dependencies of the
motion of both characters and of their interaction, while the spatial attention
learns the dependencies between the different body parts of each character and
those which are part of the interaction. Moreover, we propose using graphs to
increase the performance of the spatial attention via an interaction distance
module that helps focus on nearby joints from both characters. Extensive
experiments on the SBU interaction, K3HI, and DuetDance datasets demonstrate
the effectiveness of InterFormer. Our method is general and can be used to
generate more complex and long-term interactions.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario.
We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics.
We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Interactive Spatiotemporal Token Attention Network for Skeleton-based
General Interactive Action Recognition [8.513434732050749]
We propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations.
Our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities.
To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations.
arXiv Detail & Related papers (2023-07-14T16:51:25Z) - InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions [49.097973114627344]
We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process.
We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions.
We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
arXiv Detail & Related papers (2023-04-12T08:12:29Z) - Interaction Mix and Match: Synthesizing Close Interaction using
Conditional Hierarchical GAN with Multi-Hot Class Embedding [4.864897201841002]
We propose a novel way to create realistic human reactive motions by mixing and matching different types of close interactions.
Experiments are conducted both noisy (depth-based) and high-quality (versa-based) interaction datasets.
arXiv Detail & Related papers (2022-07-23T16:13:10Z) - GAN-based Reactive Motion Synthesis with Class-aware Discriminators for
Human-human Interaction [14.023527193608144]
We propose a semi-supervised GAN system that synthesizes the reactive motion of a character given the active motion from another character.
The high quality of the synthetic motion demonstrates the effective design of our generator, and the discriminability of the synthesis also demonstrates the strength of our discriminator.
arXiv Detail & Related papers (2021-10-01T13:13:07Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.