Efficient Entity Candidate Generation for Low-Resource Languages
- URL: http://arxiv.org/abs/2206.15163v1
- Date: Thu, 30 Jun 2022 09:49:53 GMT
- Title: Efficient Entity Candidate Generation for Low-Resource Languages
- Authors: Alberto Garc\'ia-Dur\'an, Akhil Arora, Robert West
- Abstract summary: Candidate generation is a crucial module in entity linking.
It plays a key role in multiple NLP tasks that have been proven to beneficially leverage knowledge bases.
This paper constitutes an in-depth analysis of the candidate generation problem in the context of cross-lingual entity linking.
- Score: 13.789451365205665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Candidate generation is a crucial module in entity linking. It also plays a
key role in multiple NLP tasks that have been proven to beneficially leverage
knowledge bases. Nevertheless, it has often been overlooked in the monolingual
English entity linking literature, as naive approaches obtain very good
performance. Unfortunately, the existing approaches for English cannot be
successfully transferred to poorly resourced languages. This paper constitutes
an in-depth analysis of the candidate generation problem in the context of
cross-lingual entity linking with a focus on low-resource languages. Among
other contributions, we point out limitations in the evaluation conducted in
previous works. We introduce a characterization of queries into types based on
their difficulty, which improves the interpretability of the performance of
different methods. We also propose a light-weight and simple solution based on
the construction of indexes whose design is motivated by more complex transfer
learning based neural approaches. A thorough empirical analysis on 9 real-world
datasets under 2 evaluation settings shows that our simple solution outperforms
the state-of-the-art approach in terms of both quality and efficiency for
almost all datasets and query types.
Related papers
- Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis [8.770572911942635]
We introduce novel evaluation datasets in several less-resourced languages.
We experiment with a range of approaches including the use of machine translation.
We show that language similarity is not in itself sufficient for predicting the success of cross-lingual transfer.
arXiv Detail & Related papers (2024-09-30T07:59:41Z) - ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [4.029675201787349]
Cross-lingual summarization is a sophisticated branch in Natural Language Processing.
There is no feasible solution for CLS when there is no available high-quality CLS data.
We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z) - Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages.
This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - FRASIMED: a Clinical French Annotated Resource Produced through
Crosslingual BERT-Based Annotation Projection [0.6116681488656472]
This research article introduces a methodology for generating translated versions of annotated datasets through crosslingual annotation projection.
We present the creation of French Annotated Resource with Semantic Information for Medical Detection (FRASIMED), an annotated corpus comprising 2'051 synthetic clinical cases in French.
arXiv Detail & Related papers (2023-09-19T17:17:28Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Incorporating Linguistic Knowledge for Abstractive Multi-document
Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model.
We process the dependency information into the linguistic-guided attention mechanism.
With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z) - Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking [81.41804263432684]
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
arXiv Detail & Related papers (2020-03-03T05:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.