How Does Pretraining Improve Discourse-Aware Translation?
- URL: http://arxiv.org/abs/2305.19847v1
- Date: Wed, 31 May 2023 13:36:51 GMT
- Title: How Does Pretraining Improve Discourse-Aware Translation?
- Authors: Zhihong Huang, Longyue Wang, Siyou Liu, Derek F. Wong
- Abstract summary: We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge.
We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models.
Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
- Score: 41.20896077662125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models (PLMs) have produced substantial improvements in
discourse-aware neural machine translation (NMT), for example, improved
coherence in spoken language translation. However, the underlying reasons for
their strong performance have not been well explained. To bridge this gap, we
introduce a probing task to interpret the ability of PLMs to capture discourse
relation knowledge. We validate three state-of-the-art PLMs across encoder-,
decoder-, and encoder-decoder-based models. The analysis shows that (1) the
ability of PLMs on discourse modelling varies from architecture and layer; (2)
discourse elements in a text lead to different learning difficulties for PLMs.
Besides, we investigate the effects of different PLMs on spoken language
translation. Through experiments on IWSLT2017 Chinese-English dataset, we
empirically reveal that NMT models initialized from different layers of PLMs
exhibit the same trends with the probing task. Our findings are instructive to
understand how and when discourse knowledge in PLMs should work for downstream
tasks.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks.
Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning.
This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes.
The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models.
Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z) - A Comparative Analysis of Pretrained Language Models for Text-to-Speech [13.962029761484022]
State-of-the-art text-to-speech (TTS) systems have utilized pretrained language models (PLMs) to enhance prosody and create more natural-sounding speech.
While PLMs have been extensively researched for natural language understanding (NLU), their impact on TTS has been overlooked.
This study is the first study of its kind to investigate the impact of different PLMs on TTS.
arXiv Detail & Related papers (2023-09-04T13:02:27Z) - Towards Versatile and Efficient Visual Knowledge Integration into
Pre-trained Language Models with Cross-Modal Adapters [16.44174900423759]
We propose a new plug-and-play module, X-adapter, to leverage the aligned visual and textual knowledge learned in pre-trained vision-language models.
Our method can significantly improve the performance on object-color reasoning and natural language understanding tasks.
arXiv Detail & Related papers (2023-05-12T10:08:46Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.