Improving End-to-End Text Image Translation From the Auxiliary Text
Translation Task
- URL: http://arxiv.org/abs/2210.03887v1
- Date: Sat, 8 Oct 2022 02:35:45 GMT
- Title: Improving End-to-End Text Image Translation From the Auxiliary Text
Translation Task
- Authors: Cong Ma, Yaping Zhang, Mei Tu, Xu Han, Linghui Wu, Yang Zhao, Yu Zhou
- Abstract summary: We propose a novel text translation enhanced text image translation, which trains the end-to-end model with text translation as an auxiliary task.
By sharing model parameters and multi-task training, our model is able to take full advantage of easily-available large-scale text parallel corpus.
- Score: 26.046624228278528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end text image translation (TIT), which aims at translating the source
language embedded in images to the target language, has attracted intensive
attention in recent research. However, data sparsity limits the performance of
end-to-end text image translation. Multi-task learning is a non-trivial way to
alleviate this problem via exploring knowledge from complementary related
tasks. In this paper, we propose a novel text translation enhanced text image
translation, which trains the end-to-end model with text translation as an
auxiliary task. By sharing model parameters and multi-task training, our model
is able to take full advantage of easily-available large-scale text parallel
corpus. Extensive experimental results show our proposed method outperforms
existing end-to-end methods, and the joint multi-task learning with both text
translation and recognition tasks achieves better results, proving translation
and recognition auxiliary tasks are complementary.
Related papers
- AnyTrans: Translate AnyText in the Image with Large Scale Models [88.5887934499388]
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI)
Our framework incorporates contextual cues from both textual and visual elements during translation.
We have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.
arXiv Detail & Related papers (2024-06-17T11:37:48Z) - FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark.
FineMatch focuses on text and image mismatch detection and correction.
We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z) - TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding [91.30065932213758]
Large Multimodal Models (LMMs) have sparked a surge in research aimed at harnessing their remarkable reasoning abilities.
We propose TextCoT, a novel Chain-of-Thought framework for text-rich image understanding.
Our method is free of extra training, offering immediate plug-and-play functionality.
arXiv Detail & Related papers (2024-04-15T13:54:35Z) - Rethinking and Improving Multi-task Learning for End-to-end Speech
Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules.
We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations.
We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z) - An Adversarial Multi-Task Learning Method for Chinese Text Correction
with Semantic Detection [0.0]
adversarial multi-task learning method is proposed to enhance the modeling and detection ability of character polysemy in Chinese sentence context.
Monte Carlo tree search strategy and a policy network are introduced to accomplish the efficient Chinese text correction task with semantic detection.
arXiv Detail & Related papers (2023-06-28T15:46:00Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Exploring Better Text Image Translation with Multimodal Codebook [39.12169843196739]
Text image translation (TIT) aims to translate the source texts embedded in the image to target translations.
In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies.
Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts.
We present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts.
arXiv Detail & Related papers (2023-05-27T08:41:18Z) - Improving Speech Translation by Understanding and Learning from the
Auxiliary Text Translation Task [26.703809355057224]
We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework.
Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities.
Inspired by these findings, we propose three methods to improve translation quality.
arXiv Detail & Related papers (2021-07-12T23:53:40Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.