BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video
- URL: http://arxiv.org/abs/2209.12118v1
- Date: Sun, 25 Sep 2022 01:27:35 GMT
- Title: BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video
- Authors: Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal
Dave, Bastian Leibe, Deva Ramanan
- Abstract summary: Multiple existing benchmarks involve tracking and segmenting objects in video.
There is little interaction between them due to the use of disparate benchmark datasets and metrics.
We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks.
All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
- Score: 58.71785546245467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple existing benchmarks involve tracking and segmenting objects in video
e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and
Segmentation (MOTS), but there is little interaction between them due to the
use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a
result, published works usually target a particular benchmark, and are not
easily comparable to each another. We believe that the development of
generalized methods that can tackle multiple tasks requires greater cohesion
among these research sub-communities. In this paper, we aim to facilitate this
by proposing BURST, a dataset which contains thousands of diverse videos with
high-quality object masks, and an associated benchmark with six tasks involving
object tracking and segmentation in video. All tasks are evaluated using the
same data and comparable metrics, which enables researchers to consider them in
unison, and hence, more effectively pool knowledge from different methods
across different tasks. Additionally, we demonstrate several baselines for all
tasks and show that approaches for one task can be applied to another with a
quantifiable and explainable performance difference. Dataset annotations and
evaluation code is available at: https://github.com/Ali2500/BURST-benchmark.
Related papers
- Matching Anything by Segmenting Anything [109.2507425045143]
We propose MASA, a novel method for robust instance association learning.
MASA learns instance-level correspondence through exhaustive data transformations.
We show that MASA achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences.
arXiv Detail & Related papers (2024-06-06T16:20:07Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - ISAR: A Benchmark for Single- and Few-Shot Object Instance Segmentation
and Re-Identification [24.709695178222862]
We propose ISAR, a benchmark and baseline method for single- and few-shot object identification.
We provide a semi-synthetic dataset of video sequences with ground-truth semantic annotations.
Our benchmark aligns with the emerging research trend of unifying Multi-Object Tracking, Video Object, and Re-identification.
arXiv Detail & Related papers (2023-11-05T18:51:33Z) - CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking
and Segmentation [31.167405688707575]
We propose a framework for instance-level visual analysis on video frames.
It can simultaneously conduct object detection, instance segmentation, and multi-object tracking.
We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets.
arXiv Detail & Related papers (2023-11-02T04:32:24Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Associating Objects with Transformers for Video Object Segmentation [74.51719591192787]
We propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly.
AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space.
We ranked 1st in the 3rd Large-scale Video Object Challenge.
arXiv Detail & Related papers (2021-06-04T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.