Related papers: GREC: Generalized Referring Expression Comprehension

GREC: Generalized Referring Expression Comprehension

URL: http://arxiv.org/abs/2308.16182v2
Date: Sun, 24 Dec 2023 15:13:10 GMT
Title: GREC: Generalized Referring Expression Comprehension
Authors: Shuting He, Henghui Ding, Chang Liu, Xudong Jiang
Abstract summary: This study introduces a new benchmark termed as Generalized Referring Expression (GREC) This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO.
Score: 52.83101289813662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The objective of Classic Referring Expression Comprehension (REC) is to produce a bounding box corresponding to the object mentioned in a given textual description. Commonly, existing datasets and techniques in classic REC are tailored for expressions that pertain to a single target, meaning a sole expression is linked to one specific object. Expressions that refer to multiple targets or involve no specific target have not been taken into account. This constraint hinders the practical applicability of REC. This study introduces a new benchmark termed as Generalized Referring Expression Comprehension (GREC). This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions. The design of GREC and gRefCOCO ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a GREC method implementation code, and GREC evaluation code are available at https://github.com/henghuiding/gRefCOCO.

Related papers

GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation [99.51887959226735]
This paper introduces three new benchmarks called Generalized Referring Expression (GRES), (GREC), and Generation (GREG)<n>GREx extends the classic REx to allow expressions to identify an arbitrary number of objects.<n>We construct the first large-scale GREx dataset gRefCOCO that contains multi-target, no-target, and single-target expressions.
arXiv Detail & Related papers (2026-01-08T18:59:30Z)
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval [125.19156877994612]
Generative retrieval (GR) reformulates information retrieval (IR) by framing it as the generation of document identifiers (docids)<n>We propose textscZeroGR, a zero-shot generative retrieval framework that leverages natural language instructions to extend GR across a wide range of IR tasks.<n>Specifically, textscZeroGR is composed of three key components: (i) an LM-based docid generator that unifies heterogeneous documents into semantically meaningful docids; (ii) an instruction-tuned query generator that generates diverse types of queries from natural language task descriptions to enhance
arXiv Detail & Related papers (2025-10-12T03:04:24Z)
RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation [78.01030342481246]
RecBase is a domain-agnostic foundational model pretrained with a recommendation-oriented objective.<n>We introduce a unified item tokenizer that encodes items into hierarchical concept identifiers.<n>Our model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.
arXiv Detail & Related papers (2025-09-03T08:33:43Z)
Referring Expression Instance Retrieval and A Strong End-to-End Baseline [37.47466772169063]
Text-Image Retrieval retrieves a target image from a gallery based on an image-level description.<n>Referring Expression localizes a target object within a given image using an instance-level description.<n>We introduce a new task called textbfReferring Expression Instance Retrieval (REIR), which supports both instance-level retrieval and localization.
arXiv Detail & Related papers (2025-06-23T02:28:44Z)
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension [46.07415235144545]
We address the challenging task of Generalized Referring Expression (GREC) Existing REC methods face challenges in handling the complex cases encountered in GREC. We propose a Hierarchical Alignment-enhanced Adaptive Grounding Network (HieA2G)
arXiv Detail & Related papers (2025-01-02T18:57:59Z)
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation [72.95147072227998]
3D Referring Expression aims to segment 3D objects by correlating referring expressions with point clouds. Traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. We introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision.
arXiv Detail & Related papers (2024-12-03T11:50:16Z)
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation [37.96005100341482]
Generalized Referring Expression (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. We propose a textbfCounting-Aware textbfHierarchical textbfDecoding framework (CoHD) for GRES.
arXiv Detail & Related papers (2024-05-24T15:53:59Z)
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation [18.806738617249426]
Generalized Referring Expression introduces new challenges by allowing expressions to describe multiple objects or lack specific object references. Existing RES methods, usually rely on sophisticated encoder-decoder and feature fusion modules. We propose a novel Model with Adaptive Binding Prototypes (MABP) that adaptively binds queries to object features in the corresponding region.
arXiv Detail & Related papers (2024-05-24T03:07:38Z)
GSVA: Generalized Segmentation via Multimodal Large Language Models [72.57095903188922]
Generalized Referring Expression (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. Current solutions to GRES remain unsatisfactory since segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt. We propose Generalized Vision Assistant (GSVA) to address this gap.
arXiv Detail & Related papers (2023-12-15T02:54:31Z)
Continual Referring Expression Comprehension via Dual Modular Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression. Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios. In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks. In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z)
Whether you can locate or not? Interactive Referring Expression Generation [12.148963878497243]
We propose an Interactive REG (IREG) model that can interact with a real REC model. IREG outperforms previous state-of-the-art methods on popular evaluation metrics.
arXiv Detail & Related papers (2023-08-19T10:53:32Z)
Referring Camouflaged Object Detection [97.90911862979355]
Ref-COD aims to segment specified camouflaged objects based on a small set of referring images with salient target objects. We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios.
arXiv Detail & Related papers (2023-06-13T04:15:37Z)
Referring Expression Comprehension Using Language Adaptive Inference [15.09309604460633]
This paper explores the adaptation between expressions and REC models for dynamic inference. We propose a framework named Language Adaptive Subnets (LADS), which can extract language-adaptives from the REC model conditioned on the referring expressions. Experiments on RefCOCO, RefCO+, RefCOCOg, and Referit show that the proposed method achieves faster inference speed and higher accuracy against state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-06T07:58:59Z)
GRES: Generalized Referring Expression Segmentation [32.12725360752345]
We introduce a new benchmark called Generalized Referring Expression (GRES) GRES allows expressions to refer to an arbitrary number of target objects. We construct the first large-scale GRES dataset called gRefCOCO that contains multi-target, no-target, and single-target expressions.
arXiv Detail & Related papers (2023-06-01T17:57:32Z)
Learning Non-target Knowledge for Few-shot Semantic Segmentation [160.69431034807437]
We propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query. A BG Mining Module (BGMM) is proposed to extract the BG region via learning a general BG prototype. A BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature.
arXiv Detail & Related papers (2022-05-10T13:52:48Z)
Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask. We present a "Then-Then-Segment" scheme to tackle these problems. Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.