FOCAL: A Cost-Aware Video Dataset for Active Learning
- URL: http://arxiv.org/abs/2311.10591v1
- Date: Fri, 17 Nov 2023 15:46:09 GMT
- Title: FOCAL: A Cost-Aware Video Dataset for Active Learning
- Authors: Kiran Kokilepersaud, Yash-Yee Logan, Ryan Benkert, Chen Zhou, Mohit
Prabhushankar, Ghassan AlRegib, Enrique Corona, Kunjan Singh, Mostafa
Parchami
- Abstract summary: annotation-cost refers to the time it takes an annotator to label and quality-assure a given video sequence.
We introduce a set of conformal active learning algorithms that take advantage of the sequential structure of video data.
We show that the best conformal active learning method is cheaper than the best traditional active learning method by 113 hours.
- Score: 13.886774655927875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the FOCAL (Ford-OLIVES Collaboration on Active
Learning) dataset which enables the study of the impact of annotation-cost
within a video active learning setting. Annotation-cost refers to the time it
takes an annotator to label and quality-assure a given video sequence. A
practical motivation for active learning research is to minimize
annotation-cost by selectively labeling informative samples that will maximize
performance within a given budget constraint. However, previous work in video
active learning lacks real-time annotation labels for accurately assessing cost
minimization and instead operates under the assumption that annotation-cost
scales linearly with the amount of data to annotate. This assumption does not
take into account a variety of real-world confounding factors that contribute
to a nonlinear cost such as the effect of an assistive labeling tool and the
variety of interactions within a scene such as occluded objects, weather, and
motion of objects. FOCAL addresses this discrepancy by providing real
annotation-cost labels for 126 video sequences across 69 unique city scenes
with a variety of weather, lighting, and seasonal conditions. We also introduce
a set of conformal active learning algorithms that take advantage of the
sequential structure of video data in order to achieve a better trade-off
between annotation-cost and performance while also reducing floating point
operations (FLOPS) overhead by at least 77.67%. We show how these approaches
better reflect how annotations on videos are done in practice through a
sequence selection framework. We further demonstrate the advantage of these
approaches by introducing two performance-cost metrics and show that the best
conformal active learning method is cheaper than the best traditional active
learning method by 113 hours.
Related papers
- HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Learning Tracking Representations from Single Point Annotations [49.47550029470299]
We propose to learn tracking representations from single point annotations in a weakly supervised manner.
Specifically, we propose a soft contrastive learning framework that incorporates target objectness prior to end-to-end contrastive learning.
arXiv Detail & Related papers (2024-04-15T06:50:58Z) - Revisiting Deep Active Learning for Semantic Segmentation [37.3546941940388]
We show that the data distribution is decisive for the performance of the various active learning objectives proposed in the literature.
We demonstrate that the integration of semi-supervised learning with active learning can improve performance when the two objectives are aligned.
arXiv Detail & Related papers (2023-02-08T14:23:37Z) - Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation [107.72926721837726]
coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models.
We propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of coarsely annotated data.
Our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget.
arXiv Detail & Related papers (2022-12-15T15:43:42Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Reducing Label Effort: Self-Supervised meets Active Learning [32.4747118398236]
Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets.
Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort.
The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.
arXiv Detail & Related papers (2021-08-25T20:04:44Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning.
A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives.
We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations.
We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z) - Learning Spatiotemporal Features via Video and Text Pair Discrimination [30.64670449131973]
Cross-modal pair (CPD) framework captures correlation between video and its associated text.
We train our CPD models on both standard video dataset (Kinetics-210k) and uncurated web video dataset (-300k) to demonstrate its effectiveness.
arXiv Detail & Related papers (2020-01-16T08:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.