UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking
- URL: http://arxiv.org/abs/2001.05425v1
- Date: Wed, 15 Jan 2020 16:49:31 GMT
- Title: UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking
- Authors: Jonathon Luiten, Idil Esen Zulfikar, Bastian Leibe
- Abstract summary: We present UnOVOST (Unsupervised Video Unsupervised Object Tracking) as a simple and generic algorithm which is able to track and segment a variety of objects.
In order to achieve this we introduce a novel tracklet-based Forest Path Cutting data association algorithm.
When evaluating our approach on the DAVIS 2017 Unsupervised we dataset obtain state-of-the-art performance with a &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the test-challenge benchmarks, obtaining first place in the DAVIS 2019 Video Object Challenge.
- Score: 23.326644949067145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address Unsupervised Video Object Segmentation (UVOS), the task of
automatically generating accurate pixel masks for salient objects in a video
sequence and of tracking these objects consistently through time, without any
input about which objects should be tracked. Towards solving this task, we
present UnOVOST (Unsupervised Offline Video Object Segmentation and Tracking)
as a simple and generic algorithm which is able to track and segment a large
variety of objects. This algorithm builds up tracks in a number stages, first
grouping segments into short tracklets that are spatio-temporally consistent,
before merging these tracklets into long-term consistent object tracks based on
their visual similarity. In order to achieve this we introduce a novel
tracklet-based Forest Path Cutting data association algorithm which builds up a
decision forest of track hypotheses before cutting this forest into paths that
form long-term consistent object tracks. When evaluating our approach on the
DAVIS 2017 Unsupervised dataset we obtain state-of-the-art performance with a
mean J &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the
test-challenge benchmarks, obtaining first place in the DAVIS 2019 Unsupervised
Video Object Segmentation Challenge. UnOVOST even performs competitively with
many semi-supervised video object segmentation algorithms even though it is not
given any input as to which objects should be tracked and segmented.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - Beyond SOT: Tracking Multiple Generic Objects at Once [141.36900362724975]
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video.
We introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence.
Our approach achieves highly competitive results on single-object GOT datasets, setting a new state of the art on TrackingNet with a success rate AUC of 84.4%.
arXiv Detail & Related papers (2022-12-22T17:59:19Z) - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to
Better Classify Objects in Videos [36.28269135795851]
We present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet.
By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets.
arXiv Detail & Related papers (2022-06-05T07:51:58Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Robust Visual Tracking by Segmentation [103.87369380021441]
Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
arXiv Detail & Related papers (2022-03-21T17:59:19Z) - A Discriminative Single-Shot Segmentation Network for Visual Object
Tracking [13.375369415113534]
We propose a discriminative single-shot segmentation tracker -- D3S2.
A single-shot network applies two target models with complementary geometric properties.
D3S2 outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks.
arXiv Detail & Related papers (2021-12-22T12:48:51Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - An Exploration of Target-Conditioned Segmentation Methods for Visual
Object Trackers [24.210580784051277]
We show how to transform a bounding-box tracker into a segmentation tracker.
Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers.
arXiv Detail & Related papers (2020-08-03T16:21:18Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.