Video Object of Interest Segmentation
- URL: http://arxiv.org/abs/2212.02871v1
- Date: Tue, 6 Dec 2022 10:21:10 GMT
- Title: Video Object of Interest Segmentation
- Authors: Siyuan Zhou and Chunru Zhan and Biao Wang and Tiezheng Ge and Yuning
Jiang and Li Niu
- Abstract summary: We present a new computer vision task named video object of interest segmentation (VOIS)
Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are relevant to the target image.
Since no existing dataset is perfectly suitable for this new task, we specifically construct a large-scale dataset called LiveVideos.
- Score: 27.225312139360963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a new computer vision task named video object of
interest segmentation (VOIS). Given a video and a target image of interest, our
objective is to simultaneously segment and track all objects in the video that
are relevant to the target image. This problem combines the traditional video
object segmentation task with an additional image indicating the content that
users are concerned with. Since no existing dataset is perfectly suitable for
this new task, we specifically construct a large-scale dataset called
LiveVideos, which contains 2418 pairs of target images and live videos with
instance-level annotations. In addition, we propose a transformer-based method
for this task. We revisit Swin Transformer and design a dual-path structure to
fuse video and image features. Then, a transformer decoder is employed to
generate object proposals for segmentation and tracking from the fused
features. Extensive experiments on LiveVideos dataset show the superiority of
our proposed method.
Related papers
- 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - SOVC: Subject-Oriented Video Captioning [59.04029220586337]
We propose a new video captioning task, Subject-Oriented Video Captioning (SOVC), which aims to allow users to specify the describing target via a bounding box.
To support this task, we construct two subject-oriented video captioning datasets based on two widely used video captioning datasets.
arXiv Detail & Related papers (2023-12-20T17:44:32Z) - MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions [93.35942025232943]
We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
arXiv Detail & Related papers (2023-08-16T17:58:34Z) - Guided Slot Attention for Unsupervised Video Object Segmentation [16.69412563413671]
We propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation.
The proposed model achieves state-of-the-art performance on two popular datasets.
arXiv Detail & Related papers (2023-03-15T02:08:20Z) - Is an Object-Centric Video Representation Beneficial for Transfer? [86.40870804449737]
We introduce a new object-centric video recognition model on a transformer architecture.
We show that the object-centric model outperforms prior video representations.
arXiv Detail & Related papers (2022-07-20T17:59:44Z) - VideoClick: Video Object Segmentation with a Single Click [93.7733828038616]
We propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video.
In particular, we construct a correlation volume that assigns each pixel in a target frame to either one of the objects in the reference frame or the background.
Results on this new CityscapesVideo dataset show that our approach outperforms all the baselines in this challenging setting.
arXiv Detail & Related papers (2021-01-16T23:07:48Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.