CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic
Segmentation For-Free
- URL: http://arxiv.org/abs/2309.14289v2
- Date: Tue, 28 Nov 2023 13:28:24 GMT
- Title: CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic
Segmentation For-Free
- Authors: Monika Wysocza\'nska, Micha\"el Ramamonjisoa, Tomasz Trzci\'nski,
Oriane Sim\'eoni
- Abstract summary: We propose an open-vocabulary semantic segmentation method, dubbed CLIP-DIY.
It exploits CLIP classification abilities on patches of different sizes and aggregates the decision in a single map.
We obtain state-of-the-art zero-shot semantic segmentation results on PASCAL VOC and perform on par with the best methods on COCO.
- Score: 12.15899043709721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of CLIP has opened the way for open-world image perception. The
zero-shot classification capabilities of the model are impressive but are
harder to use for dense tasks such as image segmentation. Several methods have
proposed different modifications and learning schemes to produce dense output.
Instead, we propose in this work an open-vocabulary semantic segmentation
method, dubbed CLIP-DIY, which does not require any additional training or
annotations, but instead leverages existing unsupervised object localization
approaches. In particular, CLIP-DIY is a multi-scale approach that directly
exploits CLIP classification abilities on patches of different sizes and
aggregates the decision in a single map. We further guide the segmentation
using foreground/background scores obtained using unsupervised object
localization methods. With our method, we obtain state-of-the-art zero-shot
semantic segmentation results on PASCAL VOC and perform on par with the best
methods on COCO. The code is available at
http://github.com/wysoczanska/clip-diy
Related papers
- Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels [53.8817160001038]
We propose a novel method, PixelCLIP, to adapt the CLIP image encoder for pixel-level understanding.
To address the challenges of leveraging masks without semantic labels, we devise an online clustering algorithm.
PixelCLIP shows significant performance improvements over CLIP and competitive results compared to caption-supervised methods.
arXiv Detail & Related papers (2024-09-30T01:13:03Z) - Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation [90.35249276717038]
We propose WeCLIP, a CLIP-based single-stage pipeline, for weakly supervised semantic segmentation.
Specifically, the frozen CLIP model is applied as the backbone for semantic feature extraction.
A new decoder is designed to interpret extracted semantic features for final prediction.
arXiv Detail & Related papers (2024-06-17T03:49:47Z) - CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation [31.264574799748903]
We propose an open-vocabulary semantic segmentation method, which does not require any annotations.
We show that the used self-supervised feature properties can directly be learnt from CLIP features.
Our method CLIP-DINOiser needs only a single forward pass of CLIP and two light convolutional layers at inference.
arXiv Detail & Related papers (2023-12-19T17:40:27Z) - Side Adapter Network for Open-Vocabulary Semantic Segmentation [69.18441687386733]
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN)
A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias.
Our approach significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed.
arXiv Detail & Related papers (2023-02-23T18:58:28Z) - SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary
Semantic Segmentation [26.079055078561986]
We propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation.
The main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs.
Experimental results show that our model achieves comparable or superior segmentation accuracy.
arXiv Detail & Related papers (2022-11-27T12:38:52Z) - FreeSOLO: Learning to Segment Objects without Annotations [191.82134817449528]
We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO.
Our method also presents a novel localization-aware pre-training framework, where objects can be discovered from complicated scenes in an unsupervised manner.
arXiv Detail & Related papers (2022-02-24T16:31:44Z) - DenseCLIP: Extract Free Dense Labels from CLIP [130.3830819077699]
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
DenseCLIP+ surpasses SOTA transductive zero-shot semantic segmentation methods by large margins.
Our finding suggests that DenseCLIP can serve as a new reliable source of supervision for dense prediction tasks.
arXiv Detail & Related papers (2021-12-02T09:23:01Z) - Sparse Object-level Supervision for Instance Segmentation with Pixel
Embeddings [4.038011160363972]
Most state-of-the-art instance segmentation methods have to be trained on densely annotated images.
We propose a proposal-free segmentation approach based on non-spatial embeddings.
We evaluate the proposed method on challenging 2D and 3D segmentation problems in different microscopy modalities.
arXiv Detail & Related papers (2021-03-26T16:36:56Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.