GREC: Generalized Referring Expression Comprehension
- URL: http://arxiv.org/abs/2308.16182v2
- Date: Sun, 24 Dec 2023 15:13:10 GMT
- Title: GREC: Generalized Referring Expression Comprehension
- Authors: Shuting He, Henghui Ding, Chang Liu, Xudong Jiang
- Abstract summary: This study introduces a new benchmark termed as Generalized Referring Expression (GREC)
This benchmark extends the classic REC by permitting expressions to describe any number of target objects.
To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO.
- Score: 52.83101289813662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of Classic Referring Expression Comprehension (REC) is to
produce a bounding box corresponding to the object mentioned in a given textual
description. Commonly, existing datasets and techniques in classic REC are
tailored for expressions that pertain to a single target, meaning a sole
expression is linked to one specific object. Expressions that refer to multiple
targets or involve no specific target have not been taken into account. This
constraint hinders the practical applicability of REC. This study introduces a
new benchmark termed as Generalized Referring Expression Comprehension (GREC).
This benchmark extends the classic REC by permitting expressions to describe
any number of target objects. To achieve this goal, we have built the first
large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of
expressions: those referring to multiple targets, expressions with no specific
target, and the single-target expressions. The design of GREC and gRefCOCO
ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a
GREC method implementation code, and GREC evaluation code are available at
https://github.com/henghuiding/gRefCOCO.
Related papers
- HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation [33.40691116355158]
Generalized Referring Expression (GRES) amplifies the formulation of classic RES by involving multiple/non-target scenarios.
We propose a $textbfH$ierarchical Semantic $textbfD$ecoding with $textbfC$ounting Assistance framework (HDC)
We endow HDC with explicit counting capability to facilitate comprehensive object perception in multiple/single/non-target settings.
arXiv Detail & Related papers (2024-05-24T15:53:59Z) - Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation [18.806738617249426]
Generalized Referring Expression introduces new challenges by allowing expressions to describe multiple objects or lack specific object references.
Existing RES methods, usually rely on sophisticated encoder-decoder and feature fusion modules.
We propose a novel Model with Adaptive Binding Prototypes (MABP) that adaptively binds queries to object features in the corresponding region.
arXiv Detail & Related papers (2024-05-24T03:07:38Z) - GSVA: Generalized Segmentation via Multimodal Large Language Models [72.57095903188922]
Generalized Referring Expression (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.
Current solutions to GRES remain unsatisfactory since segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt.
We propose Generalized Vision Assistant (GSVA) to address this gap.
arXiv Detail & Related papers (2023-12-15T02:54:31Z) - Continual Referring Expression Comprehension via Dual Modular
Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression.
Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios.
In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z) - Whether you can locate or not? Interactive Referring Expression
Generation [12.148963878497243]
We propose an Interactive REG (IREG) model that can interact with a real REC model.
IREG outperforms previous state-of-the-art methods on popular evaluation metrics.
arXiv Detail & Related papers (2023-08-19T10:53:32Z) - Referring Camouflaged Object Detection [97.90911862979355]
Ref-COD aims to segment specified camouflaged objects based on a small set of referring images with salient target objects.
We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios.
arXiv Detail & Related papers (2023-06-13T04:15:37Z) - Referring Expression Comprehension Using Language Adaptive Inference [15.09309604460633]
This paper explores the adaptation between expressions and REC models for dynamic inference.
We propose a framework named Language Adaptive Subnets (LADS), which can extract language-adaptives from the REC model conditioned on the referring expressions.
Experiments on RefCOCO, RefCO+, RefCOCOg, and Referit show that the proposed method achieves faster inference speed and higher accuracy against state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-06T07:58:59Z) - GRES: Generalized Referring Expression Segmentation [32.12725360752345]
We introduce a new benchmark called Generalized Referring Expression (GRES)
GRES allows expressions to refer to an arbitrary number of target objects.
We construct the first large-scale GRES dataset called gRefCOCO that contains multi-target, no-target, and single-target expressions.
arXiv Detail & Related papers (2023-06-01T17:57:32Z) - Learning Non-target Knowledge for Few-shot Semantic Segmentation [160.69431034807437]
We propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query.
A BG Mining Module (BGMM) is proposed to extract the BG region via learning a general BG prototype.
A BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature.
arXiv Detail & Related papers (2022-05-10T13:52:48Z) - Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression.
Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask.
We present a "Then-Then-Segment" scheme to tackle these problems.
Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.