Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation
- URL: http://arxiv.org/abs/2303.06345v1
- Date: Sat, 11 Mar 2023 08:42:40 GMT
- Title: Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation
- Authors: Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip
H.S. Torr
- Abstract summary: Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
- Score: 102.25240608024063
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Referring image segmentation segments an image from a language expression.
With the aim of producing high-quality masks, existing methods often adopt
iterative learning approaches that rely on RNNs or stacked attention layers to
refine vision-language features. Despite their complexity, RNN-based methods
are subject to specific encoder choices, while attention-based methods offer
limited gains. In this work, we introduce a simple yet effective alternative
for progressively learning discriminative multi-modal features. The core idea
of our approach is to leverage a continuously updated query as the
representation of the target object and at each iteration, strengthen
multi-modal features strongly correlated to the query while weakening less
related ones. As the query is initialized by language features and successively
updated by object features, our algorithm gradually shifts from being
localization-centric to segmentation-centric. This strategy enables the
incremental recovery of missing object parts and/or removal of extraneous parts
through iteration. Compared to its counterparts, our method is more
versatile$\unicode{x2014}$it can be plugged into prior arts straightforwardly
and consistently bring improvements. Experimental results on the challenging
datasets of RefCOCO, RefCOCO+, and G-Ref demonstrate its advantage with respect
to the state-of-the-art methods.
Related papers
- MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic
Segmentation [17.59676962334776]
Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation.
Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively.
We present a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart.
arXiv Detail & Related papers (2024-01-22T07:31:52Z) - EAVL: Explicitly Align Vision and Language for Referring Image Segmentation [27.351940191216343]
We introduce a Vision-Language Aligner that aligns features in the segmentation stage using dynamic convolution kernels based on the input image and sentence.
Our method harnesses the potential of the multi-modal features in the segmentation stage and aligns language features of different emphases with image features to achieve fine-grained text-to-pixel correlation.
arXiv Detail & Related papers (2023-08-18T18:59:27Z) - Reflection Invariance Learning for Few-shot Semantic Segmentation [53.20466630330429]
Few-shot semantic segmentation (FSS) aims to segment objects of unseen classes in query images with only a few annotated support images.
This paper proposes a fresh few-shot segmentation framework to mine the reflection invariance in a multi-view matching manner.
Experiments on both PASCAL-$5textiti$ and COCO-$20textiti$ datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-01T15:14:58Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Spatially Consistent Representation Learning [12.120041613482558]
We propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks.
We devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region.
On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements.
arXiv Detail & Related papers (2021-03-10T15:23:45Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.