Representing Videos as Discriminative Sub-graphs for Action Recognition
- URL: http://arxiv.org/abs/2201.04027v1
- Date: Tue, 11 Jan 2022 16:15:25 GMT
- Title: Representing Videos as Discriminative Sub-graphs for Action Recognition
- Authors: Dong Li and Zhaofan Qiu and Yingwei Pan and Ting Yao and Houqiang Li
and Tao Mei
- Abstract summary: We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
- Score: 165.54738402505194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human actions are typically of combinatorial structures or patterns, i.e.,
subjects, objects, plus spatio-temporal interactions in between. Discovering
such structures is therefore a rewarding way to reason about the dynamics of
interactions and recognize the actions. In this paper, we introduce a new
design of sub-graphs to represent and encode the discriminative patterns of
each action in the videos. Specifically, we present MUlti-scale Sub-graph
LEarning (MUSLE) framework that novelly builds space-time graphs and clusters
the graphs into compact sub-graphs on each scale with respect to the number of
nodes. Technically, MUSLE produces 3D bounding boxes, i.e., tubelets, in each
video clip, as graph nodes and takes dense connectivity as graph edges between
tubelets. For each action category, we execute online clustering to decompose
the graph into sub-graphs on each scale through learning Gaussian Mixture Layer
and select the discriminative sub-graphs as action prototypes for recognition.
Extensive experiments are conducted on both Something-Something V1 & V2 and
Kinetics-400 datasets, and superior results are reported when comparing to
state-of-the-art methods. More remarkably, our MUSLE achieves to-date the best
reported accuracy of 65.0% on Something-Something V2 validation set.
Related papers
- A Simple and Scalable Graph Neural Network for Large Directed Graphs [11.792826520370774]
We investigate various combinations of node representations and edge direction awareness within an input graph.
In response, we propose a simple yet holistic classification method A2DUG.
We demonstrate that A2DUG stably performs well on various datasets and improves the accuracy up to 11.29 compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-06-14T06:24:58Z) - Sub-Graph Learning for Spatiotemporal Forecasting via Knowledge
Distillation [22.434970343698676]
We present a new framework called KD-SGL to effectively learn the sub-graphs.
We define one global model to learn the overall structure of the graph and multiple local models for each sub-graph.
arXiv Detail & Related papers (2022-11-17T18:02:55Z) - CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph
Similarity Learning [65.1042892570989]
We propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning.
We employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning.
We transform node representations into graph-level representations via pooling operations for graph similarity computation.
arXiv Detail & Related papers (2022-05-30T13:20:26Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Multilevel Graph Matching Networks for Deep Graph Similarity Learning [79.3213351477689]
We propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects.
To compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks.
Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks.
arXiv Detail & Related papers (2020-07-08T19:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.