Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking
- URL: http://arxiv.org/abs/2003.01343v1
- Date: Tue, 3 Mar 2020 05:32:09 GMT
- Title: Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking
- Authors: Shuyan Zhou and Shruti Rijhwani and John Wieting and Jaime Carbonell
and Graham Neubig
- Abstract summary: Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
- Score: 81.41804263432684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual entity linking (XEL) is the task of finding referents in a
target-language knowledge base (KB) for mentions extracted from source-language
texts. The first step of (X)EL is candidate generation, which retrieves a list
of plausible candidate entities from the target-language KB for each mention.
Approaches based on resources from Wikipedia have proven successful in the
realm of relatively high-resource languages (HRL), but these do not extend well
to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently,
transfer learning methods have been shown to reduce the demand for resources in
the LRL by utilizing resources in closely-related languages, but the
performance still lags far behind their high-resource counterparts. In this
paper, we first assess the problems faced by current entity candidate
generation methods for low-resource XEL, then propose three improvements that
(1) reduce the disconnect between entity mentions and KB entries, and (2)
improve the robustness of the model to low-resource scenarios. The methods are
simple, but effective: we experiment with our approach on seven XEL datasets
and find that they yield an average gain of 16.9% in Top-30 gold candidate
recall, compared to state-of-the-art baselines. Our improved model also yields
an average gain of 7.9% in in-KB accuracy of end-to-end XEL.
Related papers
- Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.
For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.
We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Revisiting Projection-based Data Transfer for Cross-Lingual Named Entity Recognition in Low-Resource Languages [8.612181075294327]
We show that the data-based cross-lingual transfer method is an effective technique for crosslingual NER.
We present a novel formalized projection approach of matching source entities with extracted target candidates.
These findings highlight the robustness of projection-based data transfer as an alternative to model-based methods for crosslingual named entity recognition in lowresource languages.
arXiv Detail & Related papers (2025-01-30T21:00:47Z) - UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages [2.66269503676104]
Large language models (LLMs) under-perform on low-resource languages.
We present a method to efficiently collect text data for low-resource languages.
Our approach, UnifiedCrawl, filters and extracts common crawl using minimal compute resources.
arXiv Detail & Related papers (2024-11-21T17:41:08Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - LLMs Are Few-Shot In-Context Low-Resource Language Learners [59.74451570590808]
In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages.
We extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages.
Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs.
arXiv Detail & Related papers (2024-03-25T07:55:29Z) - GlotLID: Language Identification for Low-Resource Languages [51.38634652914054]
GlotLID-M is an LID model that satisfies the desiderata of wide coverage, reliability and efficiency.
It identifies 1665 languages, a large increase in coverage compared to prior work.
arXiv Detail & Related papers (2023-10-24T23:45:57Z) - Low Resource Summarization using Pre-trained Language Models [1.26404863283601]
We propose a methodology for adapting self-attentive transformer-based architecture models (mBERT, mT5) for low-resource summarization.
Our adapted summarization model textiturT5 can capture contextual information of low resource language effectively with evaluation score (up to 46.35 ROUGE-1, 77 BERTScore) at par with state-of-the-art models in high resource language English.
arXiv Detail & Related papers (2023-10-04T13:09:39Z) - Efficient Entity Candidate Generation for Low-Resource Languages [13.789451365205665]
Candidate generation is a crucial module in entity linking.
It plays a key role in multiple NLP tasks that have been proven to beneficially leverage knowledge bases.
This paper constitutes an in-depth analysis of the candidate generation problem in the context of cross-lingual entity linking.
arXiv Detail & Related papers (2022-06-30T09:49:53Z) - Isomorphic Cross-lingual Embeddings for Low-Resource Languages [1.5076964620370268]
Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer linguistic information learnt from higher-resource settings into lower-resource ones.
We introduce a framework to learn CLWEs, without assuming isometry, for low-resource pairs via joint exploitation of a related higher-resource language.
We show consistent gains over current methods in both quality and degree of isomorphism, as measured by bilingual lexicon induction (BLI) and eigenvalue similarity respectively.
arXiv Detail & Related papers (2022-03-28T10:39:07Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.