P-Transformer: Towards Better Document-to-Document Neural Machine
Translation
- URL: http://arxiv.org/abs/2212.05830v1
- Date: Mon, 12 Dec 2022 11:19:05 GMT
- Title: P-Transformer: Towards Better Document-to-Document Neural Machine
Translation
- Authors: Yachao Li, Junhui Li, Jing Jiang, Shimin Tao, Hao Yang and Min Zhang
- Abstract summary: We propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information.
P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation.
- Score: 34.19199123088232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Directly training a document-to-document (Doc2Doc) neural machine translation
(NMT) via Transformer from scratch, especially on small datasets usually fails
to converge. Our dedicated probing tasks show that 1) both the absolute
position and relative position information gets gradually weakened or even
vanished once it reaches the upper encoder layers, and 2) the vanishing of
absolute position information in encoder output causes the training failure of
Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer
(P-Transformer) to enhance both the absolute and relative position information
in both self-attention and cross-attention. Specifically, we integrate absolute
positional information, i.e., position embeddings, into the query-key pairs
both in self-attention and cross-attention through a simple yet effective
addition operation. Moreover, we also integrate relative position encoding in
self-attention. The proposed P-Transformer utilizes sinusoidal position
encoding and does not require any task-specified position embedding, segment
embedding, or attention mechanism. Through the above methods, we build a
Doc2Doc NMT model with P-Transformer, which ingests the source document and
completely generates the target document in a sequence-to-sequence (seq2seq)
way. In addition, P-Transformer can be applied to seq2seq-based
document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent)
translation. Extensive experimental results of Doc2Doc NMT show that
P-Transformer significantly outperforms strong baselines on widely-used 9
document-level datasets in 7 language pairs, covering small-, middle-, and
large-scales, and achieves a new state-of-the-art. Experimentation on discourse
phenomena shows that our Doc2Doc NMT models improve the translation quality in
both BLEU and discourse coherence. We make our code available on Github.
Related papers
- On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document.
In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z) - A General-Purpose Multilingual Document Encoder [9.868221447090855]
We pretrain a massively multilingual document encoder as a hierarchical transformer model (HMDE)
We leverage Wikipedia as a readily available source of comparable documents for creating training data.
We evaluate the effectiveness of HMDE in two arguably most common and prominent cross-lingual document-level tasks.
arXiv Detail & Related papers (2023-05-11T17:55:45Z) - Document Flattening: Beyond Concatenating Context for Document-Level
Neural Machine Translation [45.56189820979461]
Document Flattening (DocFlat) technique integrates Flat-Batch Attention (FB) and Neural Context Gate (NCG) into Transformer model.
We conduct comprehensive experiments and analyses on three benchmark datasets for English-German translation.
arXiv Detail & Related papers (2023-02-16T04:38:34Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Dynamic Position Encoding for Transformers [18.315954297959617]
Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years.
Transformers could fail to properly encode sequential/positional information due to their non-recurrent nature.
We propose a novel architecture with new position embeddings depending on the input text to address this shortcoming.
arXiv Detail & Related papers (2022-04-18T03:08:48Z) - Rethinking and Improving Relative Position Encoding for Vision
Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens.
We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z) - Long-Short Term Masking Transformer: A Simple but Effective Baseline for
Document-level Neural Machine Translation [28.94748226472447]
We study the pros and cons of the standard transformer in document-level translation.
We propose a surprisingly simple long-short term masking self-attention on top of the standard transformer.
We can achieve a strong result in BLEU and capture discourse phenomena.
arXiv Detail & Related papers (2020-09-19T00:29:51Z) - Document-level Neural Machine Translation with Document Embeddings [82.4684444847092]
This work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings.
The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end.
arXiv Detail & Related papers (2020-09-16T19:43:29Z) - Modeling Discourse Structure for Document-level Neural Machine
Translation [38.085454497395446]
We propose to improve document-level NMT with the aid of discourse structure information.
Specifically, we first parse the input document to obtain its discourse structure.
Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word.
Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder.
arXiv Detail & Related papers (2020-06-08T16:24:03Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.