Multi-Teacher Knowledge Distillation For Text Image Machine Translation
- URL: http://arxiv.org/abs/2305.05226v2
- Date: Wed, 10 May 2023 02:31:05 GMT
- Title: Multi-Teacher Knowledge Distillation For Text Image Machine Translation
- Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong
- Abstract summary: We propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model.
Our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models.
- Score: 40.62692548291319
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text image machine translation (TIMT) has been widely used in various
real-world applications, which translates source language texts in images into
another target language sentence. Existing methods on TIMT are mainly divided
into two categories: the recognition-then-translation pipeline model and the
end-to-end model. However, how to transfer knowledge from the pipeline model
into the end-to-end model remains an unsolved problem. In this paper, we
propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to
effectively distillate knowledge into the end-to-end TIMT model from the
pipeline model. Specifically, three teachers are utilized to improve the
performance of the end-to-end TIMT model. The image encoder in the end-to-end
TIMT model is optimized with the knowledge distillation guidance from the
recognition teacher encoder, while the sequential encoder and decoder are
improved by transferring knowledge from the translation sequential and decoder
teacher models. Furthermore, both token and sentence-level knowledge
distillations are incorporated to better boost the translation performance.
Extensive experimental results show that our proposed MTKD effectively improves
the text image translation performance and outperforms existing end-to-end and
pipeline models with fewer parameters and less decoding time, illustrating that
MTKD can take advantage of both pipeline and end-to-end models.
Related papers
- Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation [61.65537912700187]
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT)
We propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner.
arXiv Detail & Related papers (2024-03-14T16:07:39Z) - Improving Neural Machine Translation by Multi-Knowledge Integration with
Prompting [36.24578487904221]
We focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting.
We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models.
arXiv Detail & Related papers (2023-12-08T02:55:00Z) - Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
Parameter Sharing [72.56219471145232]
We propose a ST/MT multi-tasking framework with hard parameter sharing.
Our method reduces the speech-text modality gap via a pre-processing stage.
We show that our framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU.
arXiv Detail & Related papers (2023-09-27T17:48:14Z) - E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine
Translation [40.62692548291319]
Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language.
Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues.
We propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets.
arXiv Detail & Related papers (2023-05-09T04:25:52Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z) - Unified Mandarin TTS Front-end Based on Distilled BERT Model [5.103126953298633]
A pre-trained language model (PLM) based model is proposed to tackle the two most important tasks in TTS front-end.
We use a pre-trained Chinese BERT as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks.
We are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.
arXiv Detail & Related papers (2020-12-31T02:34:57Z) - Tight Integrated End-to-End Training for Cascaded Speech Translation [40.76367623739673]
A cascaded speech translation model relies on discrete and non-differentiable transcription.
Direct speech translation is an alternative method to avoid error propagation.
This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model.
arXiv Detail & Related papers (2020-11-24T15:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.