Modeling Temporal Concept Receptive Field Dynamically for Untrimmed
Video Analysis
- URL: http://arxiv.org/abs/2111.11653v1
- Date: Tue, 23 Nov 2021 04:59:48 GMT
- Title: Modeling Temporal Concept Receptive Field Dynamically for Untrimmed
Video Analysis
- Authors: Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Weigang Zhang, Qingming Huang
- Abstract summary: We study temporal concept receptive field of concept-based event representation.
We introduce temporal dynamic convolution (TDC) to give stronger flexibility to concept-based event analytics.
Different coefficients can generate appropriate and accurate temporal concept receptive field size according to input videos.
- Score: 105.06166692486674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event analysis in untrimmed videos has attracted increasing attention due to
the application of cutting-edge techniques such as CNN. As a well studied
property for CNN-based models, the receptive field is a measurement for
measuring the spatial range covered by a single feature response, which is
crucial in improving the image categorization accuracy. In video domain, video
event semantics are actually described by complex interaction among different
concepts, while their behaviors vary drastically from one video to another,
leading to the difficulty in concept-based analytics for accurate event
categorization. To model the concept behavior, we study temporal concept
receptive field of concept-based event representation, which encodes the
temporal occurrence pattern of different mid-level concepts. Accordingly, we
introduce temporal dynamic convolution (TDC) to give stronger flexibility to
concept-based event analytics. TDC can adjust the temporal concept receptive
field size dynamically according to different inputs. Notably, a set of
coefficients are learned to fuse the results of multiple convolutions with
different kernel widths that provide various temporal concept receptive field
sizes. Different coefficients can generate appropriate and accurate temporal
concept receptive field size according to input videos and highlight crucial
concepts. Based on TDC, we propose the temporal dynamic concept modeling
network (TDCMN) to learn an accurate and complete concept representation for
efficient untrimmed video analysis. Experiment results on FCVID and ActivityNet
show that TDCMN demonstrates adaptive event recognition ability conditioned on
different inputs, and improve the event recognition performance of
Concept-based methods by a large margin. Code is available at
https://github.com/qzhb/TDCMN.
Related papers
- Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Dynamic Appearance: A Video Representation for Action Recognition with
Joint Training [11.746833714322154]
We introduce a new concept, Dynamic Appearance (DA), summarizing the appearance information relating to movement in a video.
We consider distilling the dynamic appearance from raw video data as a means of efficient video understanding.
We provide extensive experimental results on four action recognition benchmarks.
arXiv Detail & Related papers (2022-11-23T07:16:16Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Dense Interaction Learning for Video-based Person Re-identification [75.03200492219003]
We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
arXiv Detail & Related papers (2021-03-16T12:22:08Z) - Visual Concept Reasoning Networks [93.99840807973546]
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks.
We propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts.
Our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
arXiv Detail & Related papers (2020-08-26T20:02:40Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.