Related papers: Robust Candidate Generation for Entity Linking on Short Social Media Texts

Robust Candidate Generation for Entity Linking on Short Social Media Texts

URL: http://arxiv.org/abs/2210.07472v1
Date: Fri, 14 Oct 2022 02:47:31 GMT
Title: Robust Candidate Generation for Entity Linking on Short Social Media Texts
Authors: Liam Hebert and Raheleh Makki and Shubhanshu Mishra and Hamidreza Saghir and Anusha Kamath and Yuval Merhav
Abstract summary: We show that in the domain of Tweets, such methods suffer as users often include informal spelling, limited context, and lack of specificity. We demonstrate a hybrid solution using long contextual representation from Wikipedia, achieving 0.93 recall.
Score: 1.5006258585503875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Entity Linking (EL) is the gateway into Knowledge Bases. Recent advances in EL utilize dense retrieval approaches for Candidate Generation, which addresses some of the shortcomings of the Lookup based approach of matching NER mentions against pre-computed dictionaries. In this work, we show that in the domain of Tweets, such methods suffer as users often include informal spelling, limited context, and lack of specificity, among other issues. We investigate these challenges on a large and recent Tweets benchmark for EL, empirically evaluate lookup and dense retrieval approaches, and demonstrate a hybrid solution using long contextual representation from Wikipedia is necessary to achieve considerable gains over previous work, achieving 0.93 recall.

Related papers

Unifying Generative and Dense Retrieval for Sequential Recommendation [37.402860622707244]
We propose LIGER, a hybrid model that combines the strengths of sequential dense retrieval and generative retrieval. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.
arXiv Detail & Related papers (2024-11-27T23:36:59Z)
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval. ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Attention Sorting Combats Recency Bias In Long Context Language Models [69.06809365227504]
Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training. We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
arXiv Detail & Related papers (2023-09-28T05:19:06Z)
Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models. LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z)
Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search [26.705196461992845]
Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc. In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine. We define two main categories of failures introduced by it, integrity and junkiness.
arXiv Detail & Related papers (2023-04-18T20:53:47Z)
A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation [13.018093610656507]
We present a novel coarse-to-fine approach to place recognition. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match.
arXiv Detail & Related papers (2023-03-13T05:56:36Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED) Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it. We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z)
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z)
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z)
Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents. We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy. We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.