Related papers: P-Transformer: Towards Better Document-to-Document Neural Machine Translation

P-Transformer: Towards Better Document-to-Document Neural Machine Translation

URL: http://arxiv.org/abs/2212.05830v1
Date: Mon, 12 Dec 2022 11:19:05 GMT
Title: P-Transformer: Towards Better Document-to-Document Neural Machine Translation
Authors: Yachao Li, Junhui Li, Jing Jiang, Shimin Tao, Hao Yang and Min Zhang
Abstract summary: We propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information. P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation.
Score: 34.19199123088232
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.

Related papers

Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation [11.36816954288264]
This paper introduces Doc-Guided Sent2Sent++, an Agent that employs an incremental sentence-level forced decoding strategy. We demonstrate that Sent2Sent++ outperforms other methods in terms of quality, consistency, and fluency.
arXiv Detail & Related papers (2025-01-15T02:25:35Z)
On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document. In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z)
A General-Purpose Multilingual Document Encoder [9.868221447090855]
We pretrain a massively multilingual document encoder as a hierarchical transformer model (HMDE) We leverage Wikipedia as a readily available source of comparable documents for creating training data. We evaluate the effectiveness of HMDE in two arguably most common and prominent cross-lingual document-level tasks.
arXiv Detail & Related papers (2023-05-11T17:55:45Z)
Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation [45.56189820979461]
Document Flattening (DocFlat) technique integrates Flat-Batch Attention (FB) and Neural Context Gate (NCG) into Transformer model. We conduct comprehensive experiments and analyses on three benchmark datasets for English-German translation.
arXiv Detail & Related papers (2023-02-16T04:38:34Z)
Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding. It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z)
Dynamic Position Encoding for Transformers [18.315954297959617]
Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers could fail to properly encode sequential/positional information due to their non-recurrent nature. We propose a novel architecture with new position embeddings depending on the input text to address this shortcoming.
arXiv Detail & Related papers (2022-04-18T03:08:48Z)
Rethinking and Improving Relative Position Encoding for Vision Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z)
Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation [28.94748226472447]
We study the pros and cons of the standard transformer in document-level translation. We propose a surprisingly simple long-short term masking self-attention on top of the standard transformer. We can achieve a strong result in BLEU and capture discourse phenomena.
arXiv Detail & Related papers (2020-09-19T00:29:51Z)
Document-level Neural Machine Translation with Document Embeddings [82.4684444847092]
This work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings. The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end.
arXiv Detail & Related papers (2020-09-16T19:43:29Z)
Modeling Discourse Structure for Document-level Neural Machine Translation [38.085454497395446]
We propose to improve document-level NMT with the aid of discourse structure information. Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder.
arXiv Detail & Related papers (2020-06-08T16:24:03Z)
Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency. We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT. The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.