Grounding Language Models for Visual Entity Recognition
- URL: http://arxiv.org/abs/2402.18695v2
- Date: Fri, 26 Jul 2024 06:34:15 GMT
- Title: Grounding Language Models for Visual Entity Recognition
- Authors: Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez,
- Abstract summary: AutoVER is an autoregressive model for Visual Entity Recognition.
It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning.
It achieves significant improvements across different dataset splits in the recently proposed Oven-Wiki benchmark.
- Score: 27.783314622717448
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our model extends an autoregressive Multi-modal Large Language Model by employing retrieval augmented constrained generation. It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning. Our method learns to distinguish similar entities within a vast label space by contrastively training on hard negative pairs in parallel with a sequence-to-sequence objective without an external retriever. During inference, a list of retrieved candidate answers explicitly guides language generation by removing invalid decoding paths. The proposed method achieves significant improvements across different dataset splits in the recently proposed Oven-Wiki benchmark. Accuracy on the Entity seen split rises from 32.7% to 61.5%. It also demonstrates superior performance on the unseen and query splits by a substantial double-digit margin.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Diffusion Guided Language Modeling [28.819061884362792]
For many applications it is desirable to control attributes, such as sentiment, of the generated language.
For auto-regressive language models, existing guidance methods are prone to decoding errors that cascade during generation and degrade performance.
In this paper we use a guided diffusion model to produce a latent proposal that steers an auto-regressive language model to generate text with desired properties.
arXiv Detail & Related papers (2024-08-08T05:06:22Z) - Revisiting Sparse Retrieval for Few-shot Entity Linking [33.15662306409253]
We propose an ELECTRA-based keyword extractor to denoise the mention context and construct a better query expression.
For training the extractor, we propose a distant supervision method to automatically generate training data based on overlapping tokens between mention contexts and entity descriptions.
Experimental results on the ZESHEL dataset demonstrate that the proposed method outperforms state-of-the-art models by a significant margin across all test domains.
arXiv Detail & Related papers (2023-10-19T03:51:10Z) - Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Regularized Contrastive Learning of Semantic Search [0.0]
Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations.
We propose a new regularization method: Regularized Contrastive Learning.
It augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators.
arXiv Detail & Related papers (2022-09-27T08:25:19Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.