Maximum Bayes Smatch Ensemble Distillation for AMR Parsing
- URL: http://arxiv.org/abs/2112.07790v1
- Date: Tue, 14 Dec 2021 23:29:37 GMT
- Title: Maximum Bayes Smatch Ensemble Distillation for AMR Parsing
- Authors: Young-Suk Lee, Ramon Fernandez Astudillo, Thanh Lam Hoang, Tahira
Naseem, Radu Florian, Salim Roukos
- Abstract summary: We show that it is possible to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation.
We attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish.
- Score: 15.344108027018006
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: AMR parsing has experienced an unprecendented increase in performance in the
last three years, due to a mixture of effects including architecture
improvements and transfer learning. Self-learning techniques have also played a
role in pushing performance forward. However, for most recent high performant
parsers, the effect of self-learning and silver data generation seems to be
fading. In this paper we show that it is possible to overcome this diminishing
returns of silver data by combining Smatch-based ensembling techniques with
ensemble distillation. In an extensive experimental setup, we push single model
English parser performance above 85 Smatch for the first time and return to
substantial gains. We also attain a new state-of-the-art for cross-lingual AMR
parsing for Chinese, German, Italian and Spanish. Finally we explore the impact
of the proposed distillation technique on domain adaptation, and show that it
can produce gains rivaling those of human annotated data for QALD-9 and achieve
a new state-of-the-art for BioAMR.
Related papers
- Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Rethink the Effectiveness of Text Data Augmentation: An Empirical
Analysis [4.771833920251869]
We evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks.
Our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks.
Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.
arXiv Detail & Related papers (2023-06-13T10:14:58Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation [135.84684279852098]
Non-Autoregressive (NAR) models significantly under-perform Auto-regressive (AR) models on various language generation tasks.
Among the NAR models, BANG is the first large-scale pre-training model on English un-labeled raw text corpus.
We propose a novel self-paced mixed distillation method to further improve the generation quality of BANG.
arXiv Detail & Related papers (2022-05-23T09:54:53Z) - From Distillation to Hard Negative Sampling: Making Sparse Neural IR
Models More Effective [15.542082655342476]
We build on SPLADE -- a sparse expansion-based retriever -- and show to which extent it is able to benefit from the same training improvements as dense models.
We study the link between effectiveness and efficiency, on in-domain and zero-shot settings.
arXiv Detail & Related papers (2022-05-10T08:08:43Z) - Smelting Gold and Silver for Improved Multilingual AMR-to-Text
Generation [55.117031558677674]
We study different techniques for automatically generating AMR annotations.
Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR.
Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.
arXiv Detail & Related papers (2021-09-08T17:55:46Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Pushing the Limits of AMR Parsing with Self-Learning [24.998016423211375]
We show how trained models can be applied to improve AMR parsing performance.
We show that without any additional human annotations, these techniques improve an already performant and achieve state-of-the-art results.
arXiv Detail & Related papers (2020-10-20T23:45:04Z) - Improving AMR Parsing with Sequence-to-Sequence Pre-training [39.33133978535497]
In this paper, we focus on sequence-to-sequence (seq2seq) AMR parsing.
We propose a seq2seq pre-training approach to build pre-trained models in both single and joint way.
Experiments show that both the single and joint pre-trained models significantly improve the performance.
arXiv Detail & Related papers (2020-10-05T04:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.