Unidentified Video Objects: A Benchmark for Dense, Open-World
Segmentation
- URL: http://arxiv.org/abs/2104.04691v1
- Date: Sat, 10 Apr 2021 06:16:25 GMT
- Title: Unidentified Video Objects: A Benchmark for Dense, Open-World
Segmentation
- Authors: Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
- Abstract summary: We present UVO, a new benchmark for open-world class-agnostic object segmentation in videos.
UVO provides approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS.
UVO is also more challenging as it includes many videos with crowded scenes and complex background motions.
- Score: 29.81399150391822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art object detection and segmentation methods work well
under the closed-world assumption. This closed-world setting assumes that the
list of object categories is available during training and deployment. However,
many real-world applications require detecting or segmenting novel objects,
i.e., object categories never seen during training. In this paper, we present,
UVO (Unidentified Video Objects), a new benchmark for open-world class-agnostic
object segmentation in videos. Besides shifting the problem focus to the
open-world setup, UVO is significantly larger, providing approximately 8 times
more videos compared with DAVIS, and 7 times more mask (instance) annotations
per video compared with YouTube-VOS and YouTube-VIS. UVO is also more
challenging as it includes many videos with crowded scenes and complex
background motions. We demonstrated that UVO can be used for other
applications, such as object tracking and super-voxel segmentation, besides
open-world object segmentation. We believe that UVo is a versatile testbed for
researchers to develop novel approaches for open-world class-agnostic object
segmentation, and inspires new research directions towards a more comprehensive
video understanding beyond classification and detection.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
We introduce a new task, Reasoning Video Object (ReasonVOS)
This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities.
We introduce VISA (Video-based large language Instructed Assistant) to tackle ReasonVOS.
arXiv Detail & Related papers (2024-07-16T02:29:29Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - OW-VISCap: Open-World Video Instance Segmentation and Captioning [95.6696714640357]
We propose an approach to jointly segment, track, and caption previously seen or unseen objects in a video.
We generate rich descriptive and object-centric captions for each detected object via a masked attention augmented LLM input.
Our approach matches or surpasses state-of-the-art on three tasks.
arXiv Detail & Related papers (2024-04-04T17:59:58Z) - Towards Open-Vocabulary Video Instance Segmentation [61.469232166803465]
Video Instance aims at segmenting and categorizing objects in videos from a closed set of training categories.
We introduce the novel task of Open-Vocabulary Video Instance, which aims to simultaneously segment, track, and classify objects in videos from open-set categories.
To benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance dataset (LV-VIS), that contains well-annotated objects from 1,196 diverse categories.
arXiv Detail & Related papers (2023-04-04T11:25:23Z) - A Comprehensive Review of Modern Object Segmentation Approaches [1.7041248235270654]
Image segmentation is the task of associating pixels in an image with their respective object class labels.
Deep learning-based approaches have been developed for image-level object recognition and pixel-level scene understanding.
Extensions of image segmentation tasks include 3D and video segmentation, where units of vox point clouds, and video frames are classified into different objects.
arXiv Detail & Related papers (2023-01-13T19:35:46Z) - Breaking the "Object" in Video Object Segmentation [36.20167854011788]
We present a dataset for Video Object under Transformations (VOST)
It consists of more than 700 high-resolution videos, captured in diverse environments, which are 21 seconds long average and densely labeled with masks instance.
A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent.
We show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues.
arXiv Detail & Related papers (2022-12-12T19:22:17Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.