Related papers: Weakly-supervised segmentation of referring expressions

Weakly-supervised segmentation of referring expressions

URL: http://arxiv.org/abs/2205.04725v2
Date: Thu, 12 May 2022 07:17:56 GMT
Title: Weakly-supervised segmentation of referring expressions
Authors: Robin Strudel, Ivan Laptev, Cordelia Schmid
Abstract summary: Text grounded semantic SEGmentation learns segmentation masks directly from image-level referring expressions without pixel-level annotations. Our approach demonstrates promising results for weakly-supervised referring expression segmentation on the PhraseCut and RefCOCO datasets.
Score: 81.73850439141374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual grounding localizes regions (boxes or segments) in the image corresponding to given referring expressions. In this work we address image segmentation from referring expressions, a problem that has so far only been addressed in a fully-supervised setting. A fully-supervised setup, however, requires pixel-wise supervision and is hard to scale given the expense of manual annotation. We therefore introduce a new task of weakly-supervised image segmentation from referring expressions and propose Text grounded semantic SEGgmentation (TSEG) that learns segmentation masks directly from image-level referring expressions without pixel-level annotations. Our transformer-based method computes patch-text similarities and guides the classification objective during training with a new multi-label patch assignment mechanism. The resulting visual grounding model segments image regions corresponding to given natural language expressions. Our approach TSEG demonstrates promising results for weakly-supervised referring expression segmentation on the challenging PhraseCut and RefCOCO datasets. TSEG also shows competitive performance when evaluated in a zero-shot setting for semantic segmentation on Pascal VOC.

Related papers

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation [33.60580908728705]
InvSeg is a test-time prompt inversion method that tackles open-vocabulary semantic segmentation. We introduce Contrastive Soft Clustering (CSC) to align derived masks with the image's structure information. InvSeg learns context-rich text prompts in embedding space and achieves accurate semantic alignment across modalities.
arXiv Detail & Related papers (2024-10-15T10:20:31Z)
SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation [36.41778553250247]
Weakly-Supervised Semantic (WSSS) aims to train segmentation models using image data with only image-level supervision. We propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space. SemPLeS can perform better semantic alignment between object regions and the associated class labels.
arXiv Detail & Related papers (2024-01-22T09:41:05Z)
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation [60.84423786769453]
We introduce a Text Augmented Spatial-aware (TAS) zero-shot referring image segmentation framework. TAS incorporates a mask proposal network for instance-level mask extraction, a text-augmented visual-text matching score for mining the image-text correlation, and a spatial for mask post-processing. The proposed method clearly outperforms state-of-the-art zero-shot referring image segmentation methods.
arXiv Detail & Related papers (2023-10-27T10:52:50Z)
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision [52.46081425504072]
We present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent. Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.
arXiv Detail & Related papers (2023-08-29T15:39:15Z)
Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation [117.36746226803993]
We introduce self-supervised spatially-consistent grouping with text-supervised semantic segmentation. Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition. Our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks.
arXiv Detail & Related papers (2023-04-03T16:24:39Z)
Zero-shot Referring Image Segmentation with Global-Local Context Features [8.77461711080319]
Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. We propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins.
arXiv Detail & Related papers (2023-03-31T06:00:50Z)
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning [50.40482222266927]
Referring Expression (RES) is aimed at localizing and segmenting the target according to the given language expression. We propose a parallel position- kernel-segmentation pipeline to better isolate and then interact with the localization and segmentation steps. Our method is simple but surprisingly effective, outperforming all previous state-of-the-art RES methods on fully- and weakly-supervised settings.
arXiv Detail & Related papers (2022-12-17T08:29:33Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.