Associating Spatially-Consistent Grouping with Text-supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2304.01114v1
- Date: Mon, 3 Apr 2023 16:24:39 GMT
- Title: Associating Spatially-Consistent Grouping with Text-supervised Semantic
Segmentation
- Authors: Yabo Zhang, Zihao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi
Feng, Wangmeng Zuo
- Abstract summary: We introduce self-supervised spatially-consistent grouping with text-supervised semantic segmentation.
Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition.
Our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks.
- Score: 117.36746226803993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we investigate performing semantic segmentation solely through
the training on image-sentence pairs. Due to the lack of dense annotations,
existing text-supervised methods can only learn to group an image into semantic
regions via pixel-insensitive feedback. As a result, their grouped results are
coarse and often contain small spurious regions, limiting the upper-bound
performance of segmentation. On the other hand, we observe that grouped results
from self-supervised models are more semantically consistent and break the
bottleneck of existing methods. Motivated by this, we introduce associate
self-supervised spatially-consistent grouping with text-supervised semantic
segmentation. Considering the part-like grouped results, we further adapt a
text-supervised model from image-level to region-level recognition with two
core designs. First, we encourage fine-grained alignment with a one-way
noun-to-region contrastive loss, which reduces the mismatched noun-region
pairs. Second, we adopt a contextually aware masking strategy to enable
simultaneous recognition of all grouped regions. Coupled with
spatially-consistent grouping and region-adapted recognition, our method
achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks,
significantly surpassing the state-of-the-art methods.
Related papers
- Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation [28.24883865053459]
This paper aims to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations.
Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts.
A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments.
arXiv Detail & Related papers (2024-04-05T17:25:17Z) - Multi-Grained Cross-modal Alignment for Learning Open-vocabulary
Semantic Segmentation from Text Supervision [23.931443799102663]
We introduce a Multi-Grained Cross-modal Alignment (MGCA) framework to bridge the granularity gap without any dense annotations.
Specifically, MGCA constructs pseudo multi-granular semantic correspondences upon image-text pairs.
Our method achieves significant advancements over state-of-the-art methods, demonstrating its effectiveness and efficiency.
arXiv Detail & Related papers (2024-03-06T13:43:36Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - Semantic Connectivity-Driven Pseudo-labeling for Cross-domain
Segmentation [89.41179071022121]
Self-training is a prevailing approach in cross-domain semantic segmentation.
We propose a novel approach called Semantic Connectivity-driven pseudo-labeling.
This approach formulates pseudo-labels at the connectivity level and thus can facilitate learning structured and low-noise semantics.
arXiv Detail & Related papers (2023-12-11T12:29:51Z) - Weakly-supervised segmentation of referring expressions [81.73850439141374]
Text grounded semantic SEGmentation learns segmentation masks directly from image-level referring expressions without pixel-level annotations.
Our approach demonstrates promising results for weakly-supervised referring expression segmentation on the PhraseCut and RefCOCO datasets.
arXiv Detail & Related papers (2022-05-10T07:52:24Z) - Region-level Contrastive and Consistency Learning for Semi-Supervised
Semantic Segmentation [30.1884540364192]
We propose a novel region-level contrastive and consistency learning framework (RC2L) for semi-supervised semantic segmentation.
Specifically, we first propose a Region Mask Contrastive (RMC) loss and a Region Feature Contrastive (RFC) loss to accomplish region-level contrastive property.
Based on the proposed region-level contrastive and consistency regularization, we develop a region-level contrastive and consistency learning framework (RC2L) for semi-supervised semantic segmentation.
arXiv Detail & Related papers (2022-04-28T07:22:47Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Towards Fewer Annotations: Active Learning via Region Impurity and
Prediction Uncertainty for Domain Adaptive Semantic Segmentation [19.55572909866489]
We propose a region-based active learning approach for semantic segmentation under a domain shift.
Our algorithm, Active Learning via Region Impurity and Prediction Uncertainty (AL-RIPU), introduces a novel acquisition strategy characterizing the spatial adjacency of image regions.
Our method only requires very few annotations to almost reach the supervised performance and substantially outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-11-25T06:40:58Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.