Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder
- URL: http://arxiv.org/abs/2409.13747v1
- Date: Thu, 12 Sep 2024 00:21:05 GMT
- Title: Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder
- Authors: Abhinav P. M., SujayKumar Reddy M, Oswald Christopher,
- Abstract summary: The project is focused on Indian regional languages, especially Telugu, Tamil, and Malayalam.
The model seeks to enable accurate and contextually appropriate translations across diverse language pairs.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This project, titled "Machine Translation with Large Language Models: Decoder-only vs. Encoder-Decoder," aims to develop a multilingual machine translation (MT) model. Focused on Indian regional languages, especially Telugu, Tamil, and Malayalam, the model seeks to enable accurate and contextually appropriate translations across diverse language pairs. By comparing Decoder-only and Encoder-Decoder architectures, the project aims to optimize translation quality and efficiency, advancing cross-linguistic communication tools.The primary objective is to develop a model capable of delivering high-quality translations that are accurate and contextually appropriate. By leveraging large language models, specifically comparing the effectiveness of Decoder-only and Encoder-Decoder architectures, the project seeks to optimize translation performance and efficiency across multilingual contexts. Through rigorous experimentation and analysis, this project aims to advance the field of machine translation, contributing valuable insights into the effectiveness of different model architectures and paving the way for enhanced cross-linguistic communication tools.
Related papers
- Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation [28.07831604833682]
We investigate the issue of the decoder-only architecture to its lack of language transfer capability.
We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage.
We impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation.
arXiv Detail & Related papers (2024-12-03T02:52:14Z) - Relay Decoding: Concatenating Large Language Models for Machine Translation [21.367605327742027]
We propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages.
By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task.
arXiv Detail & Related papers (2024-05-05T13:42:25Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Is Encoder-Decoder Redundant for Neural Machine Translation? [44.37101354412253]
encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models.
In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation.
This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
arXiv Detail & Related papers (2022-10-21T08:33:55Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders [19.44855809470709]
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation.
Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules.
We study zero-shot translation using language-specific encoders-decoders.
arXiv Detail & Related papers (2021-02-12T15:36:33Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.