Smelting Gold and Silver for Improved Multilingual AMR-to-Text
Generation
- URL: http://arxiv.org/abs/2109.03808v1
- Date: Wed, 8 Sep 2021 17:55:46 GMT
- Title: Smelting Gold and Silver for Improved Multilingual AMR-to-Text
Generation
- Authors: Leonardo F. R. Ribeiro, Jonas Pfeiffer, Yue Zhang and Iryna Gurevych
- Abstract summary: We study different techniques for automatically generating AMR annotations.
Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR.
Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.
- Score: 55.117031558677674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on multilingual AMR-to-text generation has exclusively focused on
data augmentation strategies that utilize silver AMR. However, this assumes a
high quality of generated AMRs, potentially limiting the transferability to the
target task. In this paper, we investigate different techniques for
automatically generating AMR annotations, where we aim to study which source of
information yields better multilingual results. Our models trained on gold AMR
with silver (machine translated) sentences outperform approaches which leverage
generated silver AMR. We find that combining both complementary sources of
information further improves multilingual AMR-to-text generation. Our models
surpass the previous state of the art for German, Italian, Spanish, and Chinese
by a large margin.
Related papers
- Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages [51.12146889808824]
Meta-Whisper is a novel approach to improve automatic speech recognition for low-resource languages.
It enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering [1.8843687952462742]
This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource settings.
We realize this concept by employing a Generative Adversarial Network (GAN), which augments the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator.
arXiv Detail & Related papers (2024-08-22T02:35:47Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z) - A Survey : Neural Networks for AMR-to-Text [2.3924114046608627]
AMR-to-Text is one of the key techniques in the NLP community that aims at generating sentences from the Abstract Meaning Representation (AMR) graphs.
Since AMR was proposed in 2013, the study on AMR-to-Text has become increasingly prevalent as an essential branch of structured data to text.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Maximum Bayes Smatch Ensemble Distillation for AMR Parsing [15.344108027018006]
We show that it is possible to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation.
We attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish.
arXiv Detail & Related papers (2021-12-14T23:29:37Z) - Multilingual AMR Parsing with Noisy Knowledge Distillation [68.01173640691094]
We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR by using an existing English as its teacher.
We identify that noisy input and precise output are the key to successful distillation.
arXiv Detail & Related papers (2021-09-30T15:13:48Z) - Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [51.004601358498135]
Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages.
The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
arXiv Detail & Related papers (2021-08-19T16:53:43Z) - Translate, then Parse! A strong baseline for Cross-Lingual AMR Parsing [10.495114898741205]
We develop models that project sentences from various languages onto their AMRs to capture their essential semantic structures.
In this paper, we revisit a simple two-step base-line, and enhance it with a strong NMT system and a strong AMR.
Our experiments show that T+P outperforms a recent state-of-the-art system across all tested languages.
arXiv Detail & Related papers (2021-06-08T17:52:48Z) - Pushing the Limits of AMR Parsing with Self-Learning [24.998016423211375]
We show how trained models can be applied to improve AMR parsing performance.
We show that without any additional human annotations, these techniques improve an already performant and achieve state-of-the-art results.
arXiv Detail & Related papers (2020-10-20T23:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.