Bootstrapping Multilingual AMR with Contextual Word Alignments
- URL: http://arxiv.org/abs/2102.02189v1
- Date: Wed, 3 Feb 2021 18:35:55 GMT
- Title: Bootstrapping Multilingual AMR with Contextual Word Alignments
- Authors: Janaki Sheth and Young-Suk Lee and Ramon Fernandez Astudillo and
Tahira Naseem and Radu Florian and Salim Roukos and Todd Ward
- Abstract summary: We develop a novel technique forforeign-text-to-English AMR alignment, usingthe contextual word alignment between En-glish and foreign language tokens.
This wordalignment is weakly supervised and relies onthe contextualized XLM-R word embeddings.
We achieve a highly competitive performancethat surpasses the best published results forGerman, Italian, Spanish and Chinese.
- Score: 15.588190959488538
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We develop high performance multilingualAbstract Meaning Representation (AMR)
sys-tems by projecting English AMR annotationsto other languages with weak
supervision. Weachieve this goal by bootstrapping transformer-based
multilingual word embeddings, in partic-ular those from cross-lingual RoBERTa
(XLM-R large). We develop a novel technique forforeign-text-to-English AMR
alignment, usingthe contextual word alignment between En-glish and foreign
language tokens. This wordalignment is weakly supervised and relies onthe
contextualized XLM-R word embeddings.We achieve a highly competitive
performancethat surpasses the best published results forGerman, Italian,
Spanish and Chinese.
Related papers
- Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization [17.46921734622369]
Romanized text reduces token fertility by 2x-4x.
Romanized text matches or outperforms native script representation across various NLU, NLG, and MT tasks.
arXiv Detail & Related papers (2024-01-25T16:11:41Z) - Machine Translation with Large Language Models: Prompt Engineering for
Persian, English, and Russian Directions [0.0]
Generative large language models (LLMs) have demonstrated exceptional proficiency in various natural language processing (NLP) tasks.
We conducted an investigation into two popular prompting methods and their combination, focusing on cross-language combinations of Persian, English, and Russian.
arXiv Detail & Related papers (2024-01-16T15:16:34Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Romanian Multiword Expression Detection Using Multilingual Adversarial
Training and Lateral Inhibition [0.17188280334580194]
This paper describes our improvements in automatically identifying Romanian multiword expressions on the corpus released for the PARSEME v1.2 shared task.
Our approach assumes a multilingual perspective based on the recently introduced lateral inhibition layer and adversarial training to boost the performance of the employed multilingual language models.
arXiv Detail & Related papers (2023-04-22T09:10:49Z) - Romanization-based Large-scale Adaptation of Multilingual Language
Models [124.57923286144515]
Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP.
We study and compare a plethora of data- and parameter-efficient strategies for adapting the mPLMs to romanized and non-romanized corpora of 14 diverse low-resource languages.
Our results reveal that UROMAN-based transliteration can offer strong performance for many languages, with particular gains achieved in the most challenging setups.
arXiv Detail & Related papers (2023-04-18T09:58:34Z) - Retrofitting Multilingual Sentence Embeddings with Abstract Meaning
Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR)
Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously.
Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z) - Massively Multilingual Lexical Specialization of Multilingual
Transformers [18.766379322798837]
We show that massively multilingual lexical specialization brings substantial gains in two standard cross-lingual lexical tasks.
We observe gains for languages unseen in specialization, indicating that multilingual lexical specialization enables generalization to languages with no lexical constraints.
arXiv Detail & Related papers (2022-08-01T17:47:03Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.