ClickVOS: Click Video Object Segmentation
- URL: http://arxiv.org/abs/2403.06130v1
- Date: Sun, 10 Mar 2024 08:37:37 GMT
- Title: ClickVOS: Click Video Object Segmentation
- Authors: Pinxue Guo, Lingyi Hong, Xinyu Zhou, Shuyong Gao, Wanyun Li, Jinglun
Li, Zhaoyu Chen, Xiaoqiang Li, Wei Zhang, Wenqiang Zhang
- Abstract summary: Video Object (VOS) task aims to segment objects in videos.
To address these limitations, we propose the setting named Click Video Object (ClickVOS)
ClickVOS segments objects of interest across the whole video according to a single click per object in the first frame.
- Score: 29.20434078000283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Object Segmentation (VOS) task aims to segment objects in videos.
However, previous settings either require time-consuming manual masks of target
objects at the first frame during inference or lack the flexibility to specify
arbitrary objects of interest. To address these limitations, we propose the
setting named Click Video Object Segmentation (ClickVOS) which segments objects
of interest across the whole video according to a single click per object in
the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOSP
that with point annotations to support this task. ClickVOS is of significant
practical applications and research implications due to its only 1-2 seconds
interaction time for indicating an object, comparing annotating the mask of an
object needs several minutes. However, ClickVOS also presents increased
challenges. To address this task, we propose an end-to-end baseline approach
named called Attention Before Segmentation (ABS), motivated by the attention
process of humans. ABS utilizes the given point in the first frame to perceive
the target object through a concise yet effective segmentation attention.
Although the initial object mask is possibly inaccurate, in our ABS, as the
video goes on, the initially imprecise object mask can self-heal instead of
deteriorating due to error accumulation, which is attributed to our designed
improvement memory that continuously records stable global object memory and
updates detailed dense memory. In addition, we conduct various baseline
explorations utilizing off-the-shelf algorithms from related fields, which
could provide insights for the further exploration of ClickVOS. The
experimental results demonstrate the superiority of the proposed ABS approach.
Extended datasets and codes will be available at
https://github.com/PinxueGuo/ClickVOS.
Related papers
- Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - OW-VISCap: Open-World Video Instance Segmentation and Captioning [95.6696714640357]
We propose an approach to jointly segment, track, and caption previously seen or unseen objects in a video.
We generate rich descriptive and object-centric captions for each detected object via a masked attention augmented LLM input.
Our approach matches or surpasses state-of-the-art on three tasks.
arXiv Detail & Related papers (2024-04-04T17:59:58Z) - MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions [93.35942025232943]
We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
arXiv Detail & Related papers (2023-08-16T17:58:34Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - Region Aware Video Object Segmentation with Deep Motion Modeling [56.95836951559529]
Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
arXiv Detail & Related papers (2022-07-21T01:44:40Z) - The Second Place Solution for The 4th Large-scale Video Object
Segmentation Challenge--Track 3: Referring Video Object Segmentation [18.630453674396534]
ReferFormer aims to segment object instances in a given video referred by a language expression in all video frames.
This work proposes several tricks to boost further, including cyclical learning rates, semi-supervised approach, and test-time augmentation inference.
The improved ReferFormer ranks 2nd place on CVPR2022 Referring Youtube-VOS Challenge.
arXiv Detail & Related papers (2022-06-24T02:15:06Z) - VideoClick: Video Object Segmentation with a Single Click [93.7733828038616]
We propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video.
In particular, we construct a correlation volume that assigns each pixel in a target frame to either one of the objects in the reference frame or the background.
Results on this new CityscapesVideo dataset show that our approach outperforms all the baselines in this challenging setting.
arXiv Detail & Related papers (2021-01-16T23:07:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.