Robust Visual Tracking by Segmentation
- URL: http://arxiv.org/abs/2203.11191v1
- Date: Mon, 21 Mar 2022 17:59:19 GMT
- Title: Robust Visual Tracking by Segmentation
- Authors: Matthieu Paul, Martin Danelljan, Christoph Mayer and Luc Van Gool
- Abstract summary: Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
- Score: 103.87369380021441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the target extent poses a fundamental challenge in visual object
tracking. Typically, trackers are box-centric and fully rely on a bounding box
to define the target in the scene. In practice, objects often have complex
shapes and are not aligned with the image axis. In these cases, bounding boxes
do not provide an accurate description of the target and often contain a
majority of background pixels. We propose a segmentation-centric tracking
pipeline that not only produces a highly accurate segmentation mask, but also
works internally with segmentation masks instead of bounding boxes. Thus, our
tracker is able to better learn a target representation that clearly
differentiates the target in the scene from background content. In order to
achieve the necessary robustness for the challenging tracking scenario, we
propose a separate instance localization component that is used to condition
the segmentation decoder when producing the output mask. We infer a bounding
box from the segmentation mask and validate our tracker on challenging tracking
datasets and achieve the new state of the art on LaSOT with a success AUC score
of 69.7%. Since fully evaluating the predicted masks on tracking datasets is
not possible due to the missing mask annotations, we further validate our
segmentation quality on two popular video object segmentation datasets.
Related papers
- Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual
Tracking and Segmentation [37.85026590250023]
This paper proposes a Multi-object Mask-box Integrated framework for unified Tracking and representation.
A novel pinpoint box predictor is proposed for accurate multi-object box prediction.
MITS achieves state-of-the-art performance on both Visual Object Tracking (VOT) and Video Object Tracking (VOS) benchmarks.
arXiv Detail & Related papers (2023-08-25T09:37:51Z) - Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding
BoxSupervision [52.45336255472669]
We propose a weakly supervised do-main adaptation setting for deep learning.
Box-Adapt fully explores the fine-grained segmenta-tion mask in the source domain and the weak bounding box in the target domain.
We demonstrate the effectiveness of our method in the liver segmentation task.
arXiv Detail & Related papers (2021-08-19T01:51:04Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and
Instance Segmentation [19.55647093153416]
Weakly supervised segmentation methods using bounding box annotations focus on obtaining a pixel-level mask from each box containing an object.
In this work, we utilize higher-level information from the behavior of a trained object detector, by seeking the smallest areas of the image from which the object detector produces almost the same result as it does from the whole image.
These areas constitute a bounding-box attribution map (BBAM), which identifies the target object in its bounding box and thus serves as pseudo ground-truth for weakly supervised semantic and COCO instance segmentation.
arXiv Detail & Related papers (2021-03-16T08:29:33Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Towards Accurate Pixel-wise Object Tracking by Attention Retrieval [50.06436600343181]
We propose an attention retrieval network (ARN) to perform soft spatial constraints on backbone features.
We set a new state-of-the-art on recent pixel-wise object tracking benchmark VOT 2020 while running at 40 fps.
arXiv Detail & Related papers (2020-08-06T16:25:23Z) - An Exploration of Target-Conditioned Segmentation Methods for Visual
Object Trackers [24.210580784051277]
We show how to transform a bounding-box tracker into a segmentation tracker.
Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers.
arXiv Detail & Related papers (2020-08-03T16:21:18Z) - UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking [23.326644949067145]
We present UnOVOST (Unsupervised Video Unsupervised Object Tracking) as a simple and generic algorithm which is able to track and segment a variety of objects.
In order to achieve this we introduce a novel tracklet-based Forest Path Cutting data association algorithm.
When evaluating our approach on the DAVIS 2017 Unsupervised we dataset obtain state-of-the-art performance with a &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the test-challenge benchmarks, obtaining first place in the DAVIS 2019 Video Object Challenge.
arXiv Detail & Related papers (2020-01-15T16:49:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.