Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation
- URL: http://arxiv.org/abs/2410.00266v1
- Date: Mon, 30 Sep 2024 22:34:29 GMT
- Title: Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation
- Authors: Aleyna Kütük, Tevfik Metin Sezgin,
- Abstract summary: Scene sketch semantic segmentation is a crucial task for various applications including sketch-to-image retrieval and scene understanding.
Existing sketch segmentation methods treat sketches as bitmap images, leading to the loss of temporal order among strokes.
We propose a Class-Agnostic-Temporal Network (CAVT) for scene sketch semantic segmentation.
- Score: 0.9208007322096532
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scene sketch semantic segmentation is a crucial task for various applications including sketch-to-image retrieval and scene understanding. Existing sketch segmentation methods treat sketches as bitmap images, leading to the loss of temporal order among strokes due to the shift from vector to image format. Moreover, these methods struggle to segment objects from categories absent in the training data. In this paper, we propose a Class-Agnostic Visio-Temporal Network (CAVT) for scene sketch semantic segmentation. CAVT employs a class-agnostic object detector to detect individual objects in a scene and groups the strokes of instances through its post-processing module. This is the first approach that performs segmentation at both the instance and stroke levels within scene sketches. Furthermore, there is a lack of free-hand scene sketch datasets with both instance and stroke-level class annotations. To fill this gap, we collected the largest Free-hand Instance- and Stroke-level Scene Sketch Dataset (FrISS) that contains 1K scene sketches and covers 403 object classes with dense annotations. Extensive experiments on FrISS and other datasets demonstrate the superior performance of our method over state-of-the-art scene sketch segmentation models. The code and dataset will be made public after acceptance.
Related papers
- Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification [3.3104978705632777]
We propose a novel method for object co-segmentation, i.e. pixel-level localization of a common object in a set of images.
The method achieves state-of-the-art performance among methods trained with the same level of supervision.
The benefits of the proposed co-segmentation method are further demonstrated in the task of large-scale sketch recognition.
arXiv Detail & Related papers (2024-10-17T14:16:45Z) - Open Vocabulary Semantic Scene Sketch Understanding [5.638866331696071]
We study the underexplored but fundamental vision problem of machine understanding of freehand scene sketches.
We introduce a sketch encoder that results in semantically-aware feature space, which we evaluate by testing its performance on a semantic sketch segmentation task.
Our method outperforms zero-shot CLIP pixel accuracy of segmentation results by 37 points, reaching an accuracy of $85.5%$ on the FS-COCO sketch dataset.
arXiv Detail & Related papers (2023-12-18T19:02:07Z) - Sketch-based Video Object Segmentation: Benchmark and Analysis [55.79497833614397]
This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline.
Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation.
Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.
arXiv Detail & Related papers (2023-11-13T11:53:49Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in
Context [112.07988211268612]
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO.
Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals.
We study for the first time the problem of the fine-grained image retrieval from freehand scene sketches and sketch captions.
arXiv Detail & Related papers (2022-03-04T03:00:51Z) - One Sketch for All: One-Shot Personalized Sketch Segmentation [84.45203849671003]
We present the first one-shot personalized sketch segmentation method.
We aim to segment all sketches belonging to the same category with a single sketch with a given part annotation.
We preserve the parts semantics embedded in the exemplar, and we are robust to input style and abstraction.
arXiv Detail & Related papers (2021-12-20T20:10:44Z) - Learning Panoptic Segmentation from Instance Contours [9.347742071428918]
Panopticpixel aims to provide an understanding of background (stuff) and instances of objects (things) at a pixel level.
It combines the separate tasks of semantic segmentation (level classification) and instance segmentation to build a single unified scene understanding task.
We present a fully convolution neural network that learns instance segmentation from semantic segmentation and instance contours.
arXiv Detail & Related papers (2020-10-16T03:05:48Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.