Active Learning for Video Classification with Frame Level Queries
- URL: http://arxiv.org/abs/2307.05587v1
- Date: Mon, 10 Jul 2023 15:47:13 GMT
- Title: Active Learning for Video Classification with Frame Level Queries
- Authors: Debanjan Goswami, Shayok Chakraborty
- Abstract summary: We propose a novel active learning framework for video classification.
Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video.
This involves much less manual work than watching the complete video to come up with a label.
- Score: 13.135234328352885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning algorithms have pushed the boundaries of computer vision
research and have depicted commendable performance in a variety of
applications. However, training a robust deep neural network necessitates a
large amount of labeled training data, acquiring which involves significant
time and human effort. This problem is even more serious for an application
like video classification, where a human annotator has to watch an entire video
end-to-end to furnish a label. Active learning algorithms automatically
identify the most informative samples from large amounts of unlabeled data;
this tremendously reduces the human annotation effort in inducing a machine
learning model, as only the few samples that are identified by the algorithm,
need to be labeled manually. In this paper, we propose a novel active learning
framework for video classification, with the goal of further reducing the
labeling onus on the human annotators. Our framework identifies a batch of
exemplar videos, together with a set of informative frames for each video; the
human annotator needs to merely review the frames and provide a label for each
video. This involves much less manual work than watching the complete video to
come up with a label. We formulate a criterion based on uncertainty and
diversity to identify the informative videos and exploit representative
sampling techniques to extract a set of exemplar frames from each video. To the
best of our knowledge, this is the first research effort to develop an active
learning framework for video classification, where the annotators need to
inspect only a few frames to produce a label, rather than watching the
end-to-end video.
Related papers
- Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception [1.5741307755393597]
We propose a novel learning framework to train a video-based action recognition model with weak labels for frame-level perception.
For training the model using the weak labels, we propose a novel latent loss function.
We also propose a model that uses the view-specific latent embeddings for downstream frame-level action recognition and detection tasks.
arXiv Detail & Related papers (2024-03-18T09:47:41Z) - Masked Autoencoder for Unsupervised Video Summarization [10.853922245706716]
Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks.
We claim an unsupervised autoencoder with sufficient self-supervised learning does not need any extra downstream architecture design or fine-tuning weights to be utilized as a video summarization model.
We evaluate the method in major unsupervised video summarization benchmarks to show its effectiveness under various experimental settings.
arXiv Detail & Related papers (2023-06-02T09:44:45Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - Less than Few: Self-Shot Video Instance Segmentation [50.637278655763616]
We propose to automatically learn to find appropriate support videos given a query.
We tackle, for the first time, video instance segmentation in a self-shot (and few-shot) setting.
We provide strong baseline performances that utilize a novel transformer-based model.
arXiv Detail & Related papers (2022-04-19T13:14:43Z) - Multiview Pseudo-Labeling for Semi-supervised Learning from Video [102.36355560553402]
We present a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.
Our method capitalizes on multiple views, but it nonetheless trains a model that is shared across appearance and motion input.
On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
arXiv Detail & Related papers (2021-04-01T17:59:48Z) - Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos.
Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals.
We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z) - Self-supervised Video Representation Learning by Pace Prediction [48.029602040786685]
This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction.
It stems from the observation that human visual system is sensitive to video pace.
We randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
arXiv Detail & Related papers (2020-08-13T12:40:24Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z) - Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video.
We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.