Design Challenges in Low-resource Cross-lingual Entity Linking
- URL: http://arxiv.org/abs/2005.00692v2
- Date: Wed, 7 Oct 2020 06:27:49 GMT
- Title: Design Challenges in Low-resource Cross-lingual Entity Linking
- Authors: Xingyu Fu, Weijia Shi, Xiaodong Yu, Zian Zhao, Dan Roth
- Abstract summary: Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
- Score: 56.18957576362098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual Entity Linking (XEL), the problem of grounding mentions of
entities in a foreign language text into an English knowledge base such as
Wikipedia, has seen a lot of research in recent years, with a range of
promising techniques. However, current techniques do not rise to the challenges
introduced by text in low-resource languages (LRL) and, surprisingly, fail to
generalize to text not taken from Wikipedia, on which they are usually trained.
This paper provides a thorough analysis of low-resource XEL techniques,
focusing on the key step of identifying candidate English Wikipedia titles that
correspond to a given foreign language mention. Our analysis indicates that
current methods are limited by their reliance on Wikipedia's interlanguage
links and thus suffer when the foreign language's Wikipedia is small. We
conclude that the LRL setting requires the use of outside-Wikipedia
cross-lingual resources and present a simple yet effective zero-shot XEL
system, QuEL, that utilizes search engines query logs. With experiments on 25
languages, QuEL~shows an average increase of 25\% in gold candidate recall and
of 13\% in end-to-end linking accuracy over state-of-the-art baselines.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - An Open Multilingual System for Scoring Readability of Wikipedia [3.992677070507323]
We develop a multilingual model to score the readability of Wikipedia articles.
We create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online childrens.
We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages.
arXiv Detail & Related papers (2024-06-03T23:07:18Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation
in Low Resource Languages [11.581072296148031]
Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages.
We propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference articles, written in various languages, to generate Wikipedia-style text.
arXiv Detail & Related papers (2023-03-22T04:52:43Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained
Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge.
However, studies on LMs' factual representation ability have almost invariably been performed on English.
We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z) - Crosslingual Topic Modeling with WikiPDA [15.198979978589476]
We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA)
It learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics.
We show its utility in two applications: a study of topical biases in 28 Wikipedia editions, and crosslingual supervised classification.
arXiv Detail & Related papers (2020-09-23T15:19:27Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking [81.41804263432684]
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
arXiv Detail & Related papers (2020-03-03T05:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.