Hierarchical Graph-RNNs for Action Detection of Multiple Activities
- URL: http://arxiv.org/abs/2101.08581v1
- Date: Thu, 21 Jan 2021 12:50:02 GMT
- Title: Hierarchical Graph-RNNs for Action Detection of Multiple Activities
- Authors: Sovan Biswas, Yaser Souri and Juergen Gall
- Abstract summary: We propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time.
Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account.
- Score: 20.645887084027443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an approach that spatially localizes the activities
in a video frame where each person can perform multiple activities at the same
time. Our approach takes the temporal scene context as well as the relations of
the actions of detected persons into account. While the temporal context is
modeled by a temporal recurrent neural network (RNN), the relations of the
actions are modeled by a graph RNN. Both networks are trained together and the
proposed approach achieves state of the art results on the AVA dataset.
Related papers
- Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition [53.02634128715853]
Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars.
We propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR.
It unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view.
arXiv Detail & Related papers (2025-04-14T10:23:22Z) - Pairwise Spatiotemporal Partial Trajectory Matching for Co-movement Analysis [1.0942776587291776]
Pairwise movement analysis involves identifying individuals within specific time frames.
We propose a novel method for partialtemporal matching that transforms data into interpretable images based on time windows.
We evaluate our method on a co-walking classification task, demonstrating its effectiveness in a novel co-behavior identification application.
This approach offers a powerful, interpretable framework fortemporal behavior analysis, with potential applications in social behavior research, urban planning, and healthcare.
arXiv Detail & Related papers (2024-12-03T22:25:44Z) - A Hybrid Graph Network for Complex Activity Detection in Video [40.843533889724924]
Complex Activity Detection (CompAD) extends analysis to long-term activities.
We propose a hybrid graph neural network which combines attention applied to a graph encoding the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity.
arXiv Detail & Related papers (2023-10-26T15:49:35Z) - TempGNN: Temporal Graph Neural Networks for Dynamic Session-Based
Recommendations [5.602191038593571]
Temporal Graph Neural Networks (TempGNN) is a generic framework for capturing the structural and temporal dynamics in complex item transitions.
TempGNN achieves state-of-the-art performance on two real-world e-commerce datasets.
arXiv Detail & Related papers (2023-10-20T03:13:10Z) - Spatio-Temporal Joint Graph Convolutional Networks for Traffic
Forecasting [75.10017445699532]
Recent have shifted their focus towards formulating traffic forecasting as atemporal graph modeling problem.
We propose a novel approach for accurate traffic forecasting on road networks over multiple future time steps.
arXiv Detail & Related papers (2021-11-25T08:45:14Z) - Learning Dual Dynamic Representations on Time-Sliced User-Item
Interaction Graphs for Sequential Recommendation [62.30552176649873]
We devise a novel Dynamic Representation Learning model for Sequential Recommendation (DRL-SRe)
To better model the user-item interactions for characterizing the dynamics from both sides, the proposed model builds a global user-item interaction graph for each time slice.
To enable the model to capture fine-grained temporal information, we propose an auxiliary temporal prediction task over consecutive time slices.
arXiv Detail & Related papers (2021-09-24T07:44:27Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - A Two-stream Neural Network for Pose-based Hand Gesture Recognition [23.50938160992517]
Pose based hand gesture recognition has been widely studied in the recent years.
This paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN)
The residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling.
arXiv Detail & Related papers (2021-01-22T03:22:26Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Understanding Dynamic Scenes using Graph Convolution Networks [22.022759283770377]
We present a novel framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving camera.
We show a seamless transfer of learning to multiple datasets without resorting to fine-tuning.
Such behavior prediction methods find immediate relevance in a variety of navigation tasks.
arXiv Detail & Related papers (2020-05-09T13:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.