Sequence Shortening for Context-Aware Machine Translation
- URL: http://arxiv.org/abs/2402.01416v1
- Date: Fri, 2 Feb 2024 13:55:37 GMT
- Title: Sequence Shortening for Context-Aware Machine Translation
- Authors: Pawe{\l} M\k{a}ka, Yusuf Can Semerci, Jan Scholtes, Gerasimos Spanakis
- Abstract summary: We show that a special case of multi-encoder architecture achieves higher accuracy on contrastive datasets.
We introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context.
- Score: 5.803309695504831
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context-aware Machine Translation aims to improve translations of sentences
by incorporating surrounding sentences as context. Towards this task, two main
architectures have been applied, namely single-encoder (based on concatenation)
and multi-encoder models. In this study, we show that a special case of
multi-encoder architecture, where the latent representation of the source
sentence is cached and reused as the context in the next step, achieves higher
accuracy on the contrastive datasets (where the models have to rank the correct
translation among the provided sentences) and comparable BLEU and COMET scores
as the single- and multi-encoder approaches. Furthermore, we investigate the
application of Sequence Shortening to the cached representations. We test three
pooling-based shortening techniques and introduce two novel methods - Latent
Grouping and Latent Selecting, where the network learns to group tokens or
selects the tokens to be cached as context. Our experiments show that the two
methods achieve competitive BLEU and COMET scores and accuracies on the
contrastive datasets to the other tested methods while potentially allowing for
higher interpretability and reducing the growth of memory requirements with
increased context size.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning [49.62044186504516]
In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences.
Recent studies have shown that the context encoder generates noise and makes the model robust to the choice of context.
This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context.
arXiv Detail & Related papers (2024-07-03T12:50:49Z) - A Case Study on Context Encoding in Multi-Encoder based Document-Level
Neural Machine Translation [20.120962279327493]
We evaluate the models on the ContraPro test set to study how different contexts affect pronoun translation accuracy.
Our analysis shows that the context encoder provides sufficient information to learn discourse-level information.
arXiv Detail & Related papers (2023-08-11T10:35:53Z) - Syntax-Aware Complex-Valued Neural Machine Translation [14.772317918560548]
We propose a method to incorporate syntax information into a complex-valued-Decoder architecture.
The proposed model jointly learns word-level and syntax-level attention scores from the source side to the target side using an attention mechanism.
The experimental results demonstrate that the proposed method can bring significant improvements in BLEU scores on two datasets.
arXiv Detail & Related papers (2023-07-17T15:58:05Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Divide and Rule: Training Context-Aware Multi-Encoder Translation Models
with Little Resources [20.057692375546356]
Multi-encoder models aim to improve translation quality by encoding document-level contextual information alongside the current sentence.
We show that training these parameters takes large amount of data, since the contextual training signal is sparse.
We propose an efficient alternative, based on splitting sentence pairs, that allows to enrich the training signal of a set of parallel sentences.
arXiv Detail & Related papers (2021-03-31T15:15:32Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text
Segmentation [9.416757363901295]
We introduce a novel supervised model for text segmentation with simple but explicit coherence modeling.
Our model -- a neural architecture consisting of two hierarchically connected Transformer networks -- is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones.
arXiv Detail & Related papers (2020-01-03T17:06:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.