SMART Frame Selection for Action Recognition
- URL: http://arxiv.org/abs/2012.10671v1
- Date: Sat, 19 Dec 2020 12:24:00 GMT
- Title: SMART Frame Selection for Action Recognition
- Authors: Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara
- Abstract summary: We show that selecting good frames helps in action recognition performance even in the trimmed videos domain.
We propose a method that instead of selecting frames by considering one at a time, considers them jointly.
- Score: 43.796505626453836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action recognition is computationally expensive. In this paper, we address
the problem of frame selection to improve the accuracy of action recognition.
In particular, we show that selecting good frames helps in action recognition
performance even in the trimmed videos domain. Recent work has successfully
leveraged frame selection for long, untrimmed videos, where much of the content
is not relevant, and easy to discard. In this work, however, we focus on the
more standard short, trimmed action recognition problem. We argue that good
frame selection can not only reduce the computational cost of action
recognition but also increase the accuracy by getting rid of frames that are
hard to classify. In contrast to previous work, we propose a method that
instead of selecting frames by considering one at a time, considers them
jointly. This results in a more efficient selection, where good frames are more
effectively distributed over the video, like snapshots that tell a story. We
call the proposed frame selection SMART and we test it in combination with
different backbone architectures and on multiple benchmarks (Kinetics,
Something-something, UCF101). We show that the SMART frame selection
consistently improves the accuracy compared to other frame selection strategies
while reducing the computational cost by a factor of 4 to 10 times.
Additionally, we show that when the primary goal is recognition performance,
our selection strategy can improve over recent state-of-the-art models and
frame selection strategies on various benchmarks (UCF101, HMDB51, FCVID, and
ActivityNet).
Related papers
- An Empirical Study of Frame Selection for Text-to-Video Retrieval [62.28080029331507]
Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text.
Existing methods typically select a subset of frames within a video to represent the video content for TVR.
In this paper, we make the first empirical study of frame selection for TVR.
arXiv Detail & Related papers (2023-11-01T05:03:48Z) - Search-Map-Search: A Frame Selection Paradigm for Action Recognition [21.395733318164393]
Frame selection aims to extract the most informative and representative frames to help a model better understand video content.
Existing frame selection methods either individually sample frames based on per-frame importance prediction, or adopt reinforcement learning agents to find representative frames in succession.
We propose a Search-Map-Search learning paradigm which combines the advantages of search and supervised learning to select the best combination of frames from a video as one entity.
arXiv Detail & Related papers (2023-04-20T13:49:53Z) - PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - Towards Frame Rate Agnostic Multi-Object Tracking [76.82407173177138]
We propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.
Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information.
FAPS reflects all post-processing steps in training via tracking pattern matching and fusion.
arXiv Detail & Related papers (2022-09-23T04:25:19Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z) - Sparse Black-box Video Attack with Reinforcement Learning [14.624074868199287]
We formulate the black-box video attacks into a Reinforcement Learning framework.
The environment in RL is set as the recognition model, and the agent in RL plays the role of frame selecting.
We conduct a series of experiments with two mainstream video recognition models.
arXiv Detail & Related papers (2020-01-11T14:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.