Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
- URL: http://arxiv.org/abs/2305.13582v3
- Date: Thu, 20 Jun 2024 14:42:34 GMT
- Title: Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
- Authors: Yang Chen, Vedaant Shah, Alan Ritter,
- Abstract summary: We propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data.
GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, is designed to close the performance gap between high and low-resource languages.
- Score: 18.926993352330797
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models (LLMs) combined with instruction tuning have shown significant progress in information extraction (IE) tasks, exhibiting strong generalization capabilities to unseen datasets by following annotation guidelines. However, their applicability to low-resource languages remains limited due to lack of both labeled data for fine-tuning, and unlabeled text for pre-training. In this paper, we propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data, enabling more precise predictions through annotation fusion. Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, designed to close the performance gap between high and low-resource languages. Our experiments across twelve multilingual IE datasets spanning 50 languages demonstrate that GoLLIE-TF achieves better zero-shot cross-lingual transfer over the base model. In addition, we show that TransFusion significantly improves low-resource language named entity recognition when applied to proprietary models such as GPT-4 (+5 F1) with a prompting approach, or fine-tuning different language models including decoder-only (+14 F1) and encoder-only (+13 F1) architectures.
Related papers
- Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data [64.4458540273004]
We propose a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of Large Language Models (LLMs)
Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions.
arXiv Detail & Related papers (2025-04-20T16:20:30Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised
Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields.
In this work, we construct an open-source multilingual supervised fine-tuning dataset.
The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z) - Machine Translation for Ge'ez Language [0.0]
Machine translation for low-resource languages such as Ge'ez faces challenges such as out-of-vocabulary words, domain mismatches, and lack of labeled training data.
We develop a multilingual neural machine translation (MNMT) model based on languages relatedness.
We also experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches.
arXiv Detail & Related papers (2023-11-24T14:55:23Z) - MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information
Extraction [38.88339164947934]
Cross-lingual open information extraction aims to extract structured information from raw text across multiple languages.
Previous work uses a shared cross-lingual pre-trained model to handle the different languages but underuses the potential of the language-specific representation.
We propose an effective multi-stage tuning framework called MT4CrossIE, designed for enhancing cross-lingual open information extraction.
arXiv Detail & Related papers (2023-08-12T12:38:10Z) - Bootstrapping Multilingual Semantic Parsers using Large Language Models [28.257114724384806]
translate-train paradigm of transferring English datasets across multiple languages remains to be the key ingredient for training task-specific multilingual models.
We consider the task of multilingual semantic parsing and demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting.
arXiv Detail & Related papers (2022-10-13T19:34:14Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using
Multilingual BERT [16.22182090626537]
Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities.
In this work, we explore the complementary property of lower layers to the last transformer layer of mBERT.
A feature aggregation module based on an attention mechanism is proposed to fuse the information in different layers of mBERT.
arXiv Detail & Related papers (2022-05-17T17:12:19Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.