Sketch-based Video Object Segmentation: Benchmark and Analysis
- URL: http://arxiv.org/abs/2311.07261v1
- Date: Mon, 13 Nov 2023 11:53:49 GMT
- Title: Sketch-based Video Object Segmentation: Benchmark and Analysis
- Authors: Ruolin Yang, Da Li, Conghui Hu, Timothy Hospedales, Honggang Zhang,
Yi-Zhe Song
- Abstract summary: This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline.
Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation.
Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.
- Score: 55.79497833614397
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reference-based video object segmentation is an emerging topic which aims to
segment the corresponding target object in each video frame referred by a given
reference, such as a language expression or a photo mask. However, language
expressions can sometimes be vague in conveying an intended concept and
ambiguous when similar objects in one frame are hard to distinguish by
language. Meanwhile, photo masks are costly to annotate and less practical to
provide in a real application. This paper introduces a new task of sketch-based
video object segmentation, an associated benchmark, and a strong baseline. Our
benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and
Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet
low-cost reference for video object segmentation. We take advantage of STCN, a
popular baseline of semi-supervised VOS task, and evaluate what the most
effective design for incorporating a sketch reference is. Experimental results
show sketch is more effective yet annotation-efficient than other references,
such as photo masks, language and scribble.
Related papers
- Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation [0.9208007322096532]
Scene sketch semantic segmentation is a crucial task for various applications including sketch-to-image retrieval and scene understanding.
Existing sketch segmentation methods treat sketches as bitmap images, leading to the loss of temporal order among strokes.
We propose a Class-Agnostic-Temporal Network (CAVT) for scene sketch semantic segmentation.
arXiv Detail & Related papers (2024-09-30T22:34:29Z) - One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos [41.34787907803329]
VideoLISA is a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos.
VideoLISA generates temporally consistent segmentation masks in videos based on language instructions.
arXiv Detail & Related papers (2024-09-29T07:47:15Z) - MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions [93.35942025232943]
We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
arXiv Detail & Related papers (2023-08-16T17:58:34Z) - Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object.
We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS.
Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z) - A Comprehensive Review of Modern Object Segmentation Approaches [1.7041248235270654]
Image segmentation is the task of associating pixels in an image with their respective object class labels.
Deep learning-based approaches have been developed for image-level object recognition and pixel-level scene understanding.
Extensions of image segmentation tasks include 3D and video segmentation, where units of vox point clouds, and video frames are classified into different objects.
arXiv Detail & Related papers (2023-01-13T19:35:46Z) - Abstracting Sketches through Simple Primitives [53.04827416243121]
Humans show high-level of abstraction capabilities in games that require quickly communicating object information.
We propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives.
Our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner.
arXiv Detail & Related papers (2022-07-27T14:32:39Z) - HODOR: High-level Object Descriptors for Object Re-segmentation in Video
Learned from Static Images [123.65233334380251]
We propose HODOR: a novel method that effectively leveraging annotated static images for understanding object appearance and scene context.
As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks.
Without any architectural modification, HODOR can also learn from video context around single annotated video frames.
arXiv Detail & Related papers (2021-12-16T18:59:53Z) - Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression.
Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask.
We present a "Then-Then-Segment" scheme to tackle these problems.
Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.