Accelerated Video Annotation driven by Deep Detector and Tracker
- URL: http://arxiv.org/abs/2302.09590v1
- Date: Sun, 19 Feb 2023 15:16:05 GMT
- Title: Accelerated Video Annotation driven by Deep Detector and Tracker
- Authors: Eric Price and Aamir Ahmad
- Abstract summary: Annotating object ground truth in videos is vital for several downstream tasks in robot perception and machine learning.
The accuracy of the annotated instances of the moving objects on every image frame in a video is crucially important.
We propose a new annotation method which leverages a combination of a learning-based detector and a learning-based tracker.
- Score: 12.640283469603355
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Annotating object ground truth in videos is vital for several downstream
tasks in robot perception and machine learning, such as for evaluating the
performance of an object tracker or training an image-based object detector.
The accuracy of the annotated instances of the moving objects on every image
frame in a video is crucially important. Achieving that through manual
annotations is not only very time consuming and labor intensive, but is also
prone to high error rate. State-of-the-art annotation methods depend on
manually initializing the object bounding boxes only in the first frame and
then use classical tracking methods, e.g., adaboost, or kernelized correlation
filters, to keep track of those bounding boxes. These can quickly drift,
thereby requiring tedious manual supervision. In this paper, we propose a new
annotation method which leverages a combination of a learning-based detector
(SSD) and a learning-based tracker (RE$^3$). Through this, we significantly
reduce annotation drifts, and, consequently, the required manual supervision.
We validate our approach through annotation experiments using our proposed
annotation method and existing baselines on a set of drone video frames. Source
code and detailed information on how to run the annotation program can be found
at https://github.com/robot-perception-group/smarter-labelme
Related papers
- On-the-Fly Point Annotation for Fast Medical Video Labeling [1.890063512530524]
In medical research, deep learning models rely on high-quality annotated data.
The need to adjust two corners makes the process inherently frame-by-frame.
We propose an on-the-fly method for live video annotation to enhance the annotation efficiency.
arXiv Detail & Related papers (2024-04-22T16:59:43Z) - Learning Tracking Representations from Single Point Annotations [49.47550029470299]
We propose to learn tracking representations from single point annotations in a weakly supervised manner.
Specifically, we propose a soft contrastive learning framework that incorporates target objectness prior to end-to-end contrastive learning.
arXiv Detail & Related papers (2024-04-15T06:50:58Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object.
We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS.
Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z) - Learning Video Salient Object Detection Progressively from Unlabeled
Videos [8.224670666756193]
We propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation.
Specifically, an algorithm for generating deeptemporal location labels, which consists of generating high-saliency location labels and tracking salient objects in adjacent frames, is proposed.
Although our method does not require labeled video at all, the experimental results on five public benchmarks of DAVIS, FBMS, ViSal, VOS, and DAVSOD demonstrate that our proposed method is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods.
arXiv Detail & Related papers (2022-04-05T06:12:45Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - Unsupervised Deep Representation Learning for Real-Time Tracking [137.69689503237893]
We propose an unsupervised learning method for visual tracking.
The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking.
We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning.
arXiv Detail & Related papers (2020-07-22T08:23:12Z) - AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.