Related papers: Language-driven Semantic Segmentation

Language-driven Semantic Segmentation

URL: http://arxiv.org/abs/2201.03546v1
Date: Mon, 10 Jan 2022 18:59:10 GMT
Title: Language-driven Semantic Segmentation
Authors: Boyi Li and Kilian Q. Weinberger and Serge Belongie and Vladlen Koltun and Ren\'e Ranftl
Abstract summary: We present LSeg, a novel model for language-driven semantic image segmentation. We use a text encoder to compute embeddings of descriptive input labels. The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
Score: 88.21498323896475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., "cat" and "furry"). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided. Code and demo are available at https://github.com/isl-org/lang-seg.

Related papers

Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation [0.0]
We propose a method to train a semantic segmentation model using images with annotated labels and pseudo labels.<n>The accuracy of the model depends on the quality of the pseudo labels and the amount of data with annotated labels.<n>The effectiveness of the proposed method is demonstrated through the experiments using the public datasets: PASCAL and MS COCO.
arXiv Detail & Related papers (2025-05-26T11:31:13Z)
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation [33.60580908728705]
InvSeg is a test-time prompt inversion method for semantic segmentation. We introduce Contrastive Soft Clustering to align masks with the image's structure information. InvSeg learns context-rich text prompts in embedding space and achieves accurate semantic alignment across modalities.
arXiv Detail & Related papers (2024-10-15T10:20:31Z)
Finetuning CLIP to Reason about Pairwise Differences [52.028073305958074]
We propose an approach to train vision-language models such as CLIP in a contrastive manner to reason about differences in embedding space.<n>We finetune CLIP so that text descriptions of differences between images correspond to their difference in image embedding space.<n>Our approach yields significantly improved capabilities in ranking images by a certain attribute, and improved zeroshot classification performance on many downstream image classification tasks.
arXiv Detail & Related papers (2024-09-15T13:02:14Z)
Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z)
Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling. We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z)
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch. We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information. We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more. Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z)
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning [82.70453633641466]
We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss. We show that PACL is also applicable to image-level predictions and when used with a CLIP backbone, provides a general improvement in zero-shot classification accuracy.
arXiv Detail & Related papers (2022-12-09T17:23:00Z)
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs [10.484851004093919]
We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts. We propose a novel Text-grounded Contrastive Learning framework that enables a model to directly learn region-text alignment.
arXiv Detail & Related papers (2022-12-01T18:59:03Z)
SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation [22.718908677552196]
zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level. Relative Positional integrates spatial information at the feature level and can handle arbitrary image sizes. Anneal Self-Training can automatically assign different importance to pseudo-labels.
arXiv Detail & Related papers (2021-08-27T22:18:24Z)
Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z)
Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning [28.498782661888775]
We formulate weakly supervised segmentation as a semi-supervised metric learning problem. We propose 4 types of contrastive relationships between pixels and segments in the feature space. We deliver a universal weakly supervised segmenter with significant gains on Pascal VOC and DensePose.
arXiv Detail & Related papers (2021-05-03T15:49:01Z)
From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation [22.88452754438478]
We focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations. We propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories. Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
arXiv Detail & Related papers (2020-09-25T13:26:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.