Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking
- URL: http://arxiv.org/abs/2003.01343v1
- Date: Tue, 3 Mar 2020 05:32:09 GMT
- Title: Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking
- Authors: Shuyan Zhou and Shruti Rijhwani and John Wieting and Jaime Carbonell
and Graham Neubig
- Abstract summary: Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
- Score: 81.41804263432684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual entity linking (XEL) is the task of finding referents in a
target-language knowledge base (KB) for mentions extracted from source-language
texts. The first step of (X)EL is candidate generation, which retrieves a list
of plausible candidate entities from the target-language KB for each mention.
Approaches based on resources from Wikipedia have proven successful in the
realm of relatively high-resource languages (HRL), but these do not extend well
to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently,
transfer learning methods have been shown to reduce the demand for resources in
the LRL by utilizing resources in closely-related languages, but the
performance still lags far behind their high-resource counterparts. In this
paper, we first assess the problems faced by current entity candidate
generation methods for low-resource XEL, then propose three improvements that
(1) reduce the disconnect between entity mentions and KB entries, and (2)
improve the robustness of the model to low-resource scenarios. The methods are
simple, but effective: we experiment with our approach on seven XEL datasets
and find that they yield an average gain of 16.9% in Top-30 gold candidate
recall, compared to state-of-the-art baselines. Our improved model also yields
an average gain of 7.9% in in-KB accuracy of end-to-end XEL.
Related papers
- UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages [2.66269503676104]
Large language models (LLMs) under-perform on low-resource languages.
We present a method to efficiently collect text data for low-resource languages.
Our approach, UnifiedCrawl, filters and extracts common crawl using minimal compute resources.
arXiv Detail & Related papers (2024-11-21T17:41:08Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - LLMs Are Few-Shot In-Context Low-Resource Language Learners [59.74451570590808]
In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages.
We extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages.
Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs.
arXiv Detail & Related papers (2024-03-25T07:55:29Z) - GlotLID: Language Identification for Low-Resource Languages [51.38634652914054]
GlotLID-M is an LID model that satisfies the desiderata of wide coverage, reliability and efficiency.
It identifies 1665 languages, a large increase in coverage compared to prior work.
arXiv Detail & Related papers (2023-10-24T23:45:57Z) - Low Resource Summarization using Pre-trained Language Models [1.26404863283601]
We propose a methodology for adapting self-attentive transformer-based architecture models (mBERT, mT5) for low-resource summarization.
Our adapted summarization model textiturT5 can capture contextual information of low resource language effectively with evaluation score (up to 46.35 ROUGE-1, 77 BERTScore) at par with state-of-the-art models in high resource language English.
arXiv Detail & Related papers (2023-10-04T13:09:39Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - MetaXLR -- Mixed Language Meta Representation Transformation for
Low-resource Cross-lingual Learning based on Multi-Armed Bandit [0.0]
We propose an enhanced approach which uses multiple source languages chosen in a data driven manner.
We achieve state of the art results on the NER task for the extremely low resource languages while using the same amount of data.
arXiv Detail & Related papers (2023-05-31T18:22:33Z) - Efficient Entity Candidate Generation for Low-Resource Languages [13.789451365205665]
Candidate generation is a crucial module in entity linking.
It plays a key role in multiple NLP tasks that have been proven to beneficially leverage knowledge bases.
This paper constitutes an in-depth analysis of the candidate generation problem in the context of cross-lingual entity linking.
arXiv Detail & Related papers (2022-06-30T09:49:53Z) - Isomorphic Cross-lingual Embeddings for Low-Resource Languages [1.5076964620370268]
Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer linguistic information learnt from higher-resource settings into lower-resource ones.
We introduce a framework to learn CLWEs, without assuming isometry, for low-resource pairs via joint exploitation of a related higher-resource language.
We show consistent gains over current methods in both quality and degree of isomorphism, as measured by bilingual lexicon induction (BLI) and eigenvalue similarity respectively.
arXiv Detail & Related papers (2022-03-28T10:39:07Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.