Group Activity Recognition via Dynamic Composition and Interaction
- URL: http://arxiv.org/abs/2305.05583v1
- Date: Tue, 9 May 2023 16:18:18 GMT
- Title: Group Activity Recognition via Dynamic Composition and Interaction
- Authors: Youliang Zhang, Zhuo Zhou, Wenxuan Liu, Danni Xu, Zheng Wang
- Abstract summary: We propose DynamicFormer with Dynamic composition Module (DcM) and Dynamic interaction Module (DiM) to model relations and locations of persons.
Our findings on group composition and human-object interaction inspire our core idea.
We conduct extensive experiments on two public datasets and show that our method achieves state-of-the-art.
- Score: 8.83578086094184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous group activity recognition approaches were limited to reasoning
using human relations or finding important subgroups and tended to ignore
indispensable group composition and human-object interactions. This absence
makes a partial interpretation of the scene and increases the interference of
irrelevant actions on the results. Therefore, we propose our DynamicFormer with
Dynamic composition Module (DcM) and Dynamic interaction Module (DiM) to model
relations and locations of persons and discriminate the contribution of
participants, respectively. Our findings on group composition and human-object
interaction inspire our core idea. Group composition tells us the location of
people and their relations inside the group, while interaction reflects the
relation between humans and objects outside the group. We utilize spatial and
temporal encoders in DcM to model our dynamic composition and build DiM to
explore interaction with a novel GCN, which has a transformer inside to
consider the temporal neighbors of human/object. Also, a Multi-level Dynamic
Integration is employed to integrate features from different levels. We conduct
extensive experiments on two public datasets and show that our method achieves
state-of-the-art.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions.
We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - LEMON: Learning 3D Human-Object Interaction Relation from 2D Images [56.6123961391372]
Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling.
Most existing methods approach the goal by learning to predict isolated interaction elements.
We present LEMON, a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations.
arXiv Detail & Related papers (2023-12-14T14:10:57Z) - A Grammatical Compositional Model for Video Action Detection [24.546886938243393]
We present a novel Grammatical Compositional Model (GCM) for action detection based on typical And-Or graphs.
Our model exploits the intrinsic structures and latent relationships of actions in a hierarchical manner to harness both the compositionality of grammar models and the capability of expressing rich features of DNNs.
arXiv Detail & Related papers (2023-10-04T15:24:00Z) - Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus.
On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Spatio-Temporal Dynamic Inference Network for Group Activity Recognition [7.007702816885332]
Group activity aims to understand the activity performed by a group of people in order to solve it.
Previous methods are limited in reasoning on a predefined graph, which ignores the person-specific context.
We propose Dynamic Inference Network (DIN), which composes of Dynamic Relation (DR) module and Dynamic Walk (DW) module.
arXiv Detail & Related papers (2021-08-26T12:40:20Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Skeleton-based Relational Reasoning for Group Activity Analysis [40.49389173100578]
We leverage the skeleton information to learn the interactions between the individuals straight from it.
Our experiments demonstrate the potential of skeleton-based approaches for modeling multi-person interactions.
arXiv Detail & Related papers (2020-11-11T09:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.