Temporal Relevance Analysis for Video Action Models
- URL: http://arxiv.org/abs/2204.11929v1
- Date: Mon, 25 Apr 2022 19:06:48 GMT
- Title: Temporal Relevance Analysis for Video Action Models
- Authors: Quanfu Fan, Donghyun Kim, Chun-Fu (Richard) Chen, Stan Sclaroff, Kate
Saenko, Sarah Adel Bargal
- Abstract summary: We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
- Score: 70.39411261685963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we provide a deep analysis of temporal modeling for action
recognition, an important but underexplored problem in the literature. We first
propose a new approach to quantify the temporal relationships between frames
captured by CNN-based action models based on layer-wise relevance propagation.
We then conduct comprehensive experiments and in-depth analysis to provide a
better understanding of how temporal modeling is affected by various factors
such as dataset, network architecture, and input frames. With this, we further
study some important questions for action recognition that lead to interesting
findings. Our analysis shows that there is no strong correlation between
temporal relevance and model performance; and action models tend to capture
local temporal information, but less long-range dependencies. Our codes and
models will be publicly available.
Related papers
- Temporal receptive field in dynamic graph learning: A comprehensive analysis [15.161255747900968]
We present a comprehensive analysis of the temporal receptive field in dynamic graph learning.
Our results demonstrate that appropriately chosen temporal receptive field can significantly enhance model performance.
For some models, overly large windows may introduce noise and reduce accuracy.
arXiv Detail & Related papers (2024-07-17T07:46:53Z) - Spatio-Temporal Graphical Counterfactuals: An Overview [11.616701619068804]
Counteractual is a critical yet challenging topic for artificial intelligence to learn knowledge from data.
Our aim is to investigate a survey to compare thinking and discuss different counterfactual models, theories and approaches.
arXiv Detail & Related papers (2024-07-02T01:34:13Z) - A Survey on Diffusion Models for Time Series and Spatio-Temporal Data [92.1255811066468]
We review the use of diffusion models in time series and S-temporal data, categorizing them by model, task type, data modality, and practical application domain.
We categorize diffusion models into unconditioned and conditioned types discuss time series and S-temporal data separately.
Our survey covers their application extensively in various fields including healthcare, recommendation, climate, energy, audio, and transportation.
arXiv Detail & Related papers (2024-04-29T17:19:40Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Recur, Attend or Convolve? Frame Dependency Modeling Matters for
Cross-Domain Robustness in Action Recognition [0.5448283690603357]
Previous results have shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape for various computer vision tasks.
This raises suspicion that large video models learn spurious correlations rather than to track relevant shapes over time.
We study the cross-domain robustness for recurrent, attention-based and convolutional video models, respectively, to investigate whether this robustness is influenced by the frame dependency modeling.
arXiv Detail & Related papers (2021-12-22T19:11:53Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - A Gated Fusion Network for Dynamic Saliency Prediction [16.701214795454536]
Gated Fusion Network for dynamic saliency (GFSalNet)
GFSalNet is first deep saliency model capable of making predictions in a dynamic way via gated fusion mechanism.
We show that it has a good generalization ability, and moreover, exploits temporal information more effectively via its adaptive fusion scheme.
arXiv Detail & Related papers (2021-02-15T17:18:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.