Decouple Non-parametric Knowledge Distillation For End-to-end Speech
Translation
- URL: http://arxiv.org/abs/2304.10295v1
- Date: Thu, 20 Apr 2023 13:20:03 GMT
- Title: Decouple Non-parametric Knowledge Distillation For End-to-end Speech
Translation
- Authors: Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu,
Zhen Li
- Abstract summary: We propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency.
Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-Nearest datastore.
Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.
- Score: 5.973321003365441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing techniques often attempt to make knowledge transfer from a powerful
machine translation (MT) to speech translation (ST) model with some elaborate
techniques, which often requires transcription as extra input during training.
However, transcriptions are not always available, and how to improve the ST
model performance without transcription, i.e., data efficiency, has rarely been
studied in the literature. In this paper, we propose Decoupled Non-parametric
Knowledge Distillation (DNKD) from data perspective to improve the data
efficiency. Our method follows the knowledge distillation paradigm. However,
instead of obtaining the teacher distribution from a sophisticated MT model, we
construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN)
retrieval, which removes the dependence on transcription and MT model. Then we
decouple the classic knowledge distillation loss into target and non-target
distillation to enhance the effect of the knowledge among non-target logits,
which is the prominent "dark knowledge". Experiments on MuST-C corpus show
that, the proposed method can achieve consistent improvement over the strong
baseline without requiring any transcription.
Related papers
- Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation [55.525158411296474]
We propose an approach to non-autoregressive multilingual machine translation.
Our system leverages the recent advance of the directed acyclic Transformer.
We also propose a pivot back-translation approach to improve the generalization to unseen translation directions.
arXiv Detail & Related papers (2025-02-06T22:16:28Z) - Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation [0.0]
This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields.
We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation.
We developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms.
arXiv Detail & Related papers (2024-10-01T13:40:28Z) - Multi-Teacher Knowledge Distillation For Text Image Machine Translation [40.62692548291319]
We propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model.
Our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models.
arXiv Detail & Related papers (2023-05-09T07:41:17Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Alternated Training with Synthetic and Authentic Data for Neural Machine
Translation [49.35605028467887]
We propose alternated training with synthetic and authentic data for neural machine translation (NMT)
Compared with previous work, we introduce authentic data as guidance to prevent the training of NMT models from being disturbed by noisy synthetic data.
Experiments on Chinese-English and German-English translation tasks show that our approach improves the performance over several strong baselines.
arXiv Detail & Related papers (2021-06-16T07:13:16Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - Better Neural Machine Translation by Extracting Linguistic Information
from BERT [4.353029347463806]
Adding linguistic information to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models.
We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates.
arXiv Detail & Related papers (2021-04-07T00:03:51Z) - Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech.
This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.
Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.