Freestyle Sketch-in-the-Loop Image Segmentation
- URL: http://arxiv.org/abs/2501.16022v1
- Date: Mon, 27 Jan 2025 13:07:51 GMT
- Title: Freestyle Sketch-in-the-Loop Image Segmentation
- Authors: Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song,
- Abstract summary: We introduce a "sketch-in-the-loop" image segmentation framework, enabling the segmentation of visual concepts partially, completely, or in groupings.
This framework capitalises on the synergy between sketch-based image retrieval models and large-scale pre-trained models.
Our purpose-made augmentation strategy enhances the versatility of our sketch-guided mask generation, allowing segmentation at multiple levels.
- Score: 116.1810651297801
- License:
- Abstract: In this paper, we expand the domain of sketch research into the field of image segmentation, aiming to establish freehand sketches as a query modality for subjective image segmentation. Our innovative approach introduces a "sketch-in-the-loop" image segmentation framework, enabling the segmentation of visual concepts partially, completely, or in groupings - a truly "freestyle" approach - without the need for a purpose-made dataset (i.e., mask-free). This framework capitalises on the synergy between sketch-based image retrieval (SBIR) models and large-scale pre-trained models (CLIP or DINOv2). The former provides an effective training signal, while fine-tuned versions of the latter execute the subjective segmentation. Additionally, our purpose-made augmentation strategy enhances the versatility of our sketch-guided mask generation, allowing segmentation at multiple granularity levels. Extensive evaluations across diverse benchmark datasets underscore the superior performance of our method in comparison to existing approaches across various evaluation scenarios.
Related papers
- BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation [0.0]
We propose a new architectural approach for image segmentation called Confidence-Guided Matting (CGM)
BEN is comprised of two components: BEN Base for initial segmentation and BEN Refiner for confidence refinement.
Our approach achieves substantial improvements over current state-of-the-art methods on the DIS5K validation dataset.
arXiv Detail & Related papers (2025-01-08T01:30:11Z) - Early Fusion of Features for Semantic Segmentation [10.362589129094975]
This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation.
Our methodology is rigorously tested across several benchmark datasets including Mapillary Vistas, Cityscapes, CamVid, COCO, and PASCAL-VOC2012.
The results demonstrate the effectiveness of our proposed model in achieving high segmentation accuracy, indicating its potential for various applications in image analysis.
arXiv Detail & Related papers (2024-02-08T22:58:06Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - Pixel-Level Clustering Network for Unsupervised Image Segmentation [3.69853388955692]
We present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations.
We also propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images.
arXiv Detail & Related papers (2023-10-24T23:06:29Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - One Sketch for All: One-Shot Personalized Sketch Segmentation [84.45203849671003]
We present the first one-shot personalized sketch segmentation method.
We aim to segment all sketches belonging to the same category with a single sketch with a given part annotation.
We preserve the parts semantics embedded in the exemplar, and we are robust to input style and abstraction.
arXiv Detail & Related papers (2021-12-20T20:10:44Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.