HMM-guided frame querying for bandwidth-constrained video search
- URL: http://arxiv.org/abs/2001.00057v1
- Date: Tue, 31 Dec 2019 19:54:35 GMT
- Title: HMM-guided frame querying for bandwidth-constrained video search
- Authors: Bhairav Chidambaram, Mason McGill, Pietro Perona
- Abstract summary: We design an agent to search for frames of interest in video stored on a remote server, under bandwidth constraints.
Using a convolutional neural network to score individual frames and a hidden Markov model to propagate predictions across frames, our agent accurately identifies temporal regions of interest based on sparse, strategically sampled frames.
On a subset of the ImageNet-VID dataset, we demonstrate that using a hidden Markov model to interpolate between frame scores allows requests of 98% of frames to be omitted, without compromising frame-of-interest classification accuracy.
- Score: 16.956238550063365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design an agent to search for frames of interest in video stored on a
remote server, under bandwidth constraints. Using a convolutional neural
network to score individual frames and a hidden Markov model to propagate
predictions across frames, our agent accurately identifies temporal regions of
interest based on sparse, strategically sampled frames. On a subset of the
ImageNet-VID dataset, we demonstrate that using a hidden Markov model to
interpolate between frame scores allows requests of 98% of frames to be
omitted, without compromising frame-of-interest classification accuracy.
Related papers
- Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
arXiv Detail & Related papers (2023-09-04T06:41:33Z) - Key Frame Extraction with Attention Based Deep Neural Networks [0.0]
We propose a deep learning-based approach for detection using a deep auto-encoder model with an attention layer.
The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means algorithm to group these features and similar frames together.
The method was evaluated on the TVSUM clustering video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods.
arXiv Detail & Related papers (2023-06-21T15:09:37Z) - Search-Map-Search: A Frame Selection Paradigm for Action Recognition [21.395733318164393]
Frame selection aims to extract the most informative and representative frames to help a model better understand video content.
Existing frame selection methods either individually sample frames based on per-frame importance prediction, or adopt reinforcement learning agents to find representative frames in succession.
We propose a Search-Map-Search learning paradigm which combines the advantages of search and supervised learning to select the best combination of frames from a video as one entity.
arXiv Detail & Related papers (2023-04-20T13:49:53Z) - PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - Optimizing Video Prediction via Video Frame Interpolation [53.16726447796844]
We present a new optimization framework for video prediction via video frame, inspired by photo-realistic results of video framescapes.
Our framework is based on optimization with a pretrained differentiable video frame module without the need for a training dataset.
Our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
arXiv Detail & Related papers (2022-06-27T17:03:46Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - Semi-supervised and Deep learning Frameworks for Video Classification
and Key-frame Identification [1.2335698325757494]
We present two semi-supervised approaches that automatically classify scenes for content and filter frames for scene understanding tasks.
The proposed framework can be scaled to additional video data streams for automated training of perception-driven systems.
arXiv Detail & Related papers (2022-03-25T05:45:18Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features [7.895528973776606]
This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion.
FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
arXiv Detail & Related papers (2021-03-25T08:47:46Z) - SF-Net: Single-Frame Supervision for Temporal Action Localization [60.202516362976645]
Single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead.
We propose a unified system called SF-Net to make use of such single-frame supervision.
SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization.
arXiv Detail & Related papers (2020-03-15T15:06:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.