A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
One-Stage Referring Expression Comprehension
- URL: http://arxiv.org/abs/2204.07913v2
- Date: Thu, 14 Sep 2023 13:33:43 GMT
- Title: A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
One-Stage Referring Expression Comprehension
- Authors: Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji
- Abstract summary: We build a one-stage referring expression comprehension network called SimREC.
We conduct over 100 experimental trials on three benchmark datasets of REC.
With much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models.
- Score: 81.57558029858954
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Most of the existing work in one-stage referring expression comprehension
(REC) mainly focuses on multi-modal fusion and reasoning, while the influence
of other factors in this task lacks in-depth exploration. To fill this gap, we
conduct an empirical study in this paper. Concretely, we first build a very
simple REC network called SimREC, and ablate 42 candidate designs/settings,
which covers the entire process of one-stage REC from network design to model
training. Afterwards, we conduct over 100 experimental trials on three
benchmark datasets of REC. The extensive experimental results not only show the
key factors that affect REC performance in addition to multi-modal fusion,
e.g., multi-scale features and data augmentation, but also yield some findings
that run counter to conventional understanding. For example, as a vision and
language (V&L) task, REC does is less impacted by language prior. In addition,
with a proper combination of these findings, we can improve the performance of
SimREC by a large margin, e.g., +27.12% on RefCOCO+, which outperforms all
existing REC methods. But the most encouraging finding is that with much less
training overhead and parameters, SimREC can still achieve better performance
than a set of large-scale pre-trained models, e.g., UNITER and VILLA,
portraying the special role of REC in existing V&L research.
Related papers
- An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token.
We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains.
Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z) - V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions [3.3235895997314726]
We present V-RECS, the first Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration.
V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users.
arXiv Detail & Related papers (2024-06-21T15:50:10Z) - Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction [11.535892987373947]
Relation extraction (RE) aims to identify relations between entities mentioned in texts.
Large language models (LLMs) have demonstrated impressive in-context learning abilities in various tasks.
LLMs suffer from poor performances compared to most supervised fine-tuned RE methods.
arXiv Detail & Related papers (2024-04-27T07:12:52Z) - Continual Referring Expression Comprehension via Dual Modular
Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression.
Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios.
In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Multi-task Collaborative Network for Joint Referring Expression
Comprehension and Segmentation [135.67558811281984]
We propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning offerring expression comprehension (REC) and segmentation (RES)
In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent.
We address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS)
arXiv Detail & Related papers (2020-03-19T14:25:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.