Adversarially Robust Frame Sampling with Bounded Irregularities
- URL: http://arxiv.org/abs/2002.01147v1
- Date: Tue, 4 Feb 2020 06:33:43 GMT
- Title: Adversarially Robust Frame Sampling with Bounded Irregularities
- Authors: Hanhan Li, Pin Wang
- Abstract summary: Video analysis tools for automatically extracting meaningful information from videos are widely studied and deployed.
Most of them use deep neural networks which are computationally expensive, feeding only a subset of video frames into such algorithms is desired.
We present an elegant solution to this sampling problem that is provably robust against adversarial attacks and introduces bounded irregularities as well.
- Score: 11.434633941880143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, video analysis tools for automatically extracting meaningful
information from videos are widely studied and deployed. Because most of them
use deep neural networks which are computationally expensive, feeding only a
subset of video frames into such algorithms is desired. Sampling the frames
with fixed rate is always attractive for its simplicity, representativeness,
and interpretability. For example, a popular cloud video API generated video
and shot labels by processing only the first frame of every second in a video.
However, one can easily attack such strategies by placing chosen frames at the
sampled locations. In this paper, we present an elegant solution to this
sampling problem that is provably robust against adversarial attacks and
introduces bounded irregularities as well.
Related papers
- SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
arXiv Detail & Related papers (2023-09-04T06:41:33Z) - Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models [41.12711820047315]
Video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem.
We propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions.
arXiv Detail & Related papers (2023-07-09T14:54:30Z) - Key Frame Extraction with Attention Based Deep Neural Networks [0.0]
We propose a deep learning-based approach for detection using a deep auto-encoder model with an attention layer.
The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means algorithm to group these features and similar frames together.
The method was evaluated on the TVSUM clustering video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods.
arXiv Detail & Related papers (2023-06-21T15:09:37Z) - Blind Video Deflickering by Neural Filtering with a Flawed Atlas [90.96203200658667]
We propose a general flicker removal framework that only receives a single flickering video as input without additional guidance.
The core of our approach is utilizing the neural atlas in cooperation with a neural filtering strategy.
To validate our method, we construct a dataset that contains diverse real-world flickering videos.
arXiv Detail & Related papers (2023-03-14T17:52:29Z) - Rethinking the Video Sampling and Reasoning Strategies for Temporal
Sentence Grounding [64.99924160432144]
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
We propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames.
arXiv Detail & Related papers (2023-01-02T03:38:22Z) - NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames.
NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z) - FrameHopper: Selective Processing of Video Frames in Detection-driven
Real-Time Video Analytics [2.5119455331413376]
Detection-driven real-time video analytics require continuous detection of objects contained in the video frames.
Running these detectors on each and every frame in resource-constrained edge devices is computationally intensive.
We propose an off-line Reinforcement Learning (RL)-based algorithm to determine these skip-lengths.
arXiv Detail & Related papers (2022-03-22T07:05:57Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.