Video Annotation for Visual Tracking via Selection and Refinement
- URL: http://arxiv.org/abs/2108.03821v1
- Date: Mon, 9 Aug 2021 05:56:47 GMT
- Title: Video Annotation for Visual Tracking via Selection and Refinement
- Authors: Kenan Dai, Jie Zhao, Lijun Wang, Dong Wang, Jianhua Li, Huchuan Lu,
Xuesheng Qian, Xiaoyun Yang
- Abstract summary: We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
- Score: 74.08109740917122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning based visual trackers entail offline pre-training on large
volumes of video datasets with accurate bounding box annotations that are
labor-expensive to achieve. We present a new framework to facilitate bounding
box annotations for video sequences, which investigates a
selection-and-refinement strategy to automatically improve the preliminary
annotations generated by tracking algorithms. A temporal assessment network
(T-Assess Net) is proposed which is able to capture the temporal coherence of
target locations and select reliable tracking results by measuring their
quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is
also designed to further enhance the selected tracking results by considering
both target appearance and temporal geometry constraints, allowing inaccurate
tracking results to be corrected. The combination of the above two networks
provides a principled approach to ensure the quality of automatic video
annotation. Experiments on large scale tracking benchmarks demonstrate that our
method can deliver highly accurate bounding box annotations and significantly
reduce human labor by 94.0%, yielding an effective means to further boost
tracking performance with augmented training data.
Related papers
- Learning Tracking Representations from Single Point Annotations [49.47550029470299]
We propose to learn tracking representations from single point annotations in a weakly supervised manner.
Specifically, we propose a soft contrastive learning framework that incorporates target objectness prior to end-to-end contrastive learning.
arXiv Detail & Related papers (2024-04-15T06:50:58Z) - Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z) - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to
Better Classify Objects in Videos [36.28269135795851]
We present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet.
By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets.
arXiv Detail & Related papers (2022-06-05T07:51:58Z) - A Bayesian Detect to Track System for Robust Visual Object Tracking and
Semi-Supervised Model Learning [1.7268829007643391]
We ad-dress problems in a Bayesian tracking and detection framework parameterized by neural network outputs.
We propose a particle filter-based approximate sampling algorithm for tracking object state estimation.
Based on our particle filter inference algorithm, a semi-supervised learn-ing algorithm is utilized for learning tracking network on intermittent labeled frames.
arXiv Detail & Related papers (2022-05-05T00:18:57Z) - Weakly Supervised Video Salient Object Detection [79.51227350937721]
We present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations"
An "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling.
arXiv Detail & Related papers (2021-04-06T09:48:38Z) - Self-supervised Object Tracking with Cycle-consistent Siamese Networks [55.040249900677225]
We exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking.
We propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame.
arXiv Detail & Related papers (2020-08-03T04:10:38Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - Object-Adaptive LSTM Network for Real-time Visual Tracking with
Adversarial Data Augmentation [31.842910084312265]
We propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations.
Experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed.
arXiv Detail & Related papers (2020-02-07T03:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.