Robust Candidate Generation for Entity Linking on Short Social Media
Texts
- URL: http://arxiv.org/abs/2210.07472v1
- Date: Fri, 14 Oct 2022 02:47:31 GMT
- Title: Robust Candidate Generation for Entity Linking on Short Social Media
Texts
- Authors: Liam Hebert and Raheleh Makki and Shubhanshu Mishra and Hamidreza
Saghir and Anusha Kamath and Yuval Merhav
- Abstract summary: We show that in the domain of Tweets, such methods suffer as users often include informal spelling, limited context, and lack of specificity.
We demonstrate a hybrid solution using long contextual representation from Wikipedia, achieving 0.93 recall.
- Score: 1.5006258585503875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity Linking (EL) is the gateway into Knowledge Bases. Recent advances in
EL utilize dense retrieval approaches for Candidate Generation, which addresses
some of the shortcomings of the Lookup based approach of matching NER mentions
against pre-computed dictionaries. In this work, we show that in the domain of
Tweets, such methods suffer as users often include informal spelling, limited
context, and lack of specificity, among other issues. We investigate these
challenges on a large and recent Tweets benchmark for EL, empirically evaluate
lookup and dense retrieval approaches, and demonstrate a hybrid solution using
long contextual representation from Wikipedia is necessary to achieve
considerable gains over previous work, achieving 0.93 recall.
Related papers
- Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Attention Sorting Combats Recency Bias In Long Context Language Models [69.06809365227504]
Current language models often fail to incorporate long contexts efficiently during generation.
We show that a major contributor to this issue are attention priors that are likely learned during pre-training.
We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
arXiv Detail & Related papers (2023-09-28T05:19:06Z) - Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models.
LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z) - Integrity and Junkiness Failure Handling for Embedding-based Retrieval:
A Case Study in Social Network Search [26.705196461992845]
Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc.
In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine.
We define two main categories of failures introduced by it, integrity and junkiness.
arXiv Detail & Related papers (2023-04-18T20:53:47Z) - A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation [13.018093610656507]
We present a novel coarse-to-fine approach to place recognition.
In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors.
We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates.
In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match.
arXiv Detail & Related papers (2023-03-13T05:56:36Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Improving Contextual Recognition of Rare Words with an Alternate
Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset.
We show results for shallow fusion contextual biasing applied to two different decoding algorithms.
We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.