Instance-Specific Feature Propagation for Referring Segmentation
- URL: http://arxiv.org/abs/2204.12109v1
- Date: Tue, 26 Apr 2022 07:08:14 GMT
- Title: Instance-Specific Feature Propagation for Referring Segmentation
- Authors: Chang Liu, Xudong Jiang, and Henghui Ding
- Abstract summary: Referring segmentation aims to generate a segmentation mask for the target instance indicated by a natural language expression.
We propose a novel framework that simultaneously detects the target-of-interest via feature propagation and generates a fine-grained segmentation mask.
- Score: 28.58551450280675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Referring segmentation aims to generate a segmentation mask for the target
instance indicated by a natural language expression. There are typically two
kinds of existing methods: one-stage methods that directly perform segmentation
on the fused vision and language features; and two-stage methods that first
utilize an instance segmentation model for instance proposal and then select
one of these instances via matching them with language features. In this work,
we propose a novel framework that simultaneously detects the target-of-interest
via feature propagation and generates a fine-grained segmentation mask. In our
framework, each instance is represented by an Instance-Specific Feature (ISF),
and the target-of-referring is identified by exchanging information among all
ISFs using our proposed Feature Propagation Module (FPM). Our instance-aware
approach learns the relationship among all objects, which helps to better
locate the target-of-interest than one-stage methods. Comparing to two-stage
methods, our approach collaboratively and interactively utilizes both vision
and language information for synchronous identification and segmentation. In
the experimental tests, our method outperforms previous state-of-the-art
methods on all three RefCOCO series datasets.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - RISAM: Referring Image Segmentation via Mutual-Aware Attention Features [13.64992652002458]
Referring image segmentation (RIS) aims to segment a particular region based on a language expression prompt.
Existing methods incorporate linguistic features into visual features and obtain multi-modal features for mask decoding.
We propose MARIS, a referring image segmentation method that leverages the Segment Anything Model (SAM) and introduces a mutual-aware attention mechanism.
arXiv Detail & Related papers (2023-11-27T11:24:25Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - CMF: Cascaded Multi-model Fusion for Referring Image Segmentation [24.942658173937563]
We address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression.
We propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel.
Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods.
arXiv Detail & Related papers (2021-06-16T08:18:39Z) - Comprehensive Multi-Modal Interactions for Referring Image Segmentation [7.064383217512461]
We investigate Referring Image (RIS), which outputs a segmentation map corresponding to the given natural language description.
To solve RIS efficiently, we need to understand each word's relationship with other words, each region in the image to other regions, and cross-modal alignment between linguistic and visual domains.
We propose a Joint Reasoning (JRM) module and a novel Cross-Modal Multi-Level Fusion (CMMLF) module for tackling this task.
arXiv Detail & Related papers (2021-04-21T08:45:09Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression.
Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask.
We present a "Then-Then-Segment" scheme to tackle these problems.
Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z) - Commonality-Parsing Network across Shape and Appearance for Partially
Supervised Instance Segmentation [71.59275788106622]
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Our model significantly outperforms the state-of-the-art methods on both partially supervised setting and few-shot setting for instance segmentation on COCO dataset.
arXiv Detail & Related papers (2020-07-24T07:23:44Z) - Joint Learning of Instance and Semantic Segmentation for Robotic
Pick-and-Place with Heavy Occlusions in Clutter [28.45734662893933]
We present a joint learning of instance and semantic segmentation for visible and occluded region masks.
In the experiments, we evaluated the proposed joint learning comparing the instance-only learning on the test dataset.
We also applied the joint learning model to 2 different types of robotic pick-and-place tasks.
arXiv Detail & Related papers (2020-01-21T12:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.