Multi-task Collaborative Network for Joint Referring Expression
Comprehension and Segmentation
- URL: http://arxiv.org/abs/2003.08813v1
- Date: Thu, 19 Mar 2020 14:25:18 GMT
- Title: Multi-task Collaborative Network for Joint Referring Expression
Comprehension and Segmentation
- Authors: Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng
Deng and Rongrong Ji
- Abstract summary: We propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning offerring expression comprehension (REC) and segmentation (RES)
In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent.
We address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS)
- Score: 135.67558811281984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Referring expression comprehension (REC) and segmentation (RES) are two
highly-related tasks, which both aim at identifying the referent according to a
natural language expression. In this paper, we propose a novel Multi-task
Collaborative Network (MCN) to achieve a joint learning of REC and RES for the
first time. In MCN, RES can help REC to achieve better language-vision
alignment, while REC can help RES to better locate the referent. In addition,
we address a key challenge in this multi-task setup, i.e., the prediction
conflict, with two innovative designs namely, Consistency Energy Maximization
(CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM
enables REC and RES to focus on similar visual regions by maximizing the
consistency energy between two tasks. ASNLS supresses the response of unrelated
regions in RES based on the prediction of REC. To validate our model, we
conduct extensive experiments on three benchmark datasets of REC and RES, i.e.,
RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant
performance gains of MCN over all existing methods, i.e., up to +7.13% for REC
and +11.50% for RES over SOTA, which well confirm the validity of our model for
joint REC and RES learning.
Related papers
- Multi-branch Collaborative Learning Network for 3D Visual Grounding [66.67647903507927]
3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration.
We argue that employing separate branches for 3DREC and 3DRES tasks enhances the model's capacity to learn specific information for each task.
arXiv Detail & Related papers (2024-07-07T13:27:14Z) - Continual Referring Expression Comprehension via Dual Modular
Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression.
Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios.
In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z) - Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs)
Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process.
We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z) - Whether you can locate or not? Interactive Referring Expression
Generation [12.148963878497243]
We propose an Interactive REG (IREG) model that can interact with a real REC model.
IREG outperforms previous state-of-the-art methods on popular evaluation metrics.
arXiv Detail & Related papers (2023-08-19T10:53:32Z) - A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers [76.51245425667845]
Relation extraction (RE) involves identifying the relations between entities from underlying content.
Deep neural networks have dominated the field of RE and made noticeable progress.
This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.
arXiv Detail & Related papers (2023-06-03T08:39:25Z) - Towards Unifying Reference Expression Generation and Comprehension [22.72363956296498]
We propose a unified model for REG and REC, named UniRef.
It unifies these two tasks with the carefully-designed Image-Region-Text Fusion layer (IRTF), which fuses the image, region and text via the image cross-attention and region cross-attention.
We further propose Vision-conditioned Masked Language Modeling (VMLM) and Text-Conditioned Region Prediction (TRP) to pre-train UniRef model on multi-granular corpora.
arXiv Detail & Related papers (2022-10-24T09:53:41Z) - Summarization as Indirect Supervision for Relation Extraction [23.98136192661566]
We present SuRE, which converts Relation extraction (RE) into a summarization formulation.
We develop sentence and relation conversion techniques that essentially bridge the formulation of summarization and RE tasks.
Experiments on three datasets demonstrate the effectiveness of SuRE in both full-dataset and low-resource settings.
arXiv Detail & Related papers (2022-05-19T20:25:29Z) - A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
One-Stage Referring Expression Comprehension [81.57558029858954]
We build a one-stage referring expression comprehension network called SimREC.
We conduct over 100 experimental trials on three benchmark datasets of REC.
With much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models.
arXiv Detail & Related papers (2022-04-17T03:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.