Abstract: Document-level machine translation conditions on surrounding sentences to
produce coherent translations. There has been much recent work in this area
with the introduction of custom model architectures and decoding algorithms.
This paper presents a systematic comparison of selected approaches from the
literature on two benchmarks for which document-level phenomena evaluation
suites exist. We find that a simple method based purely on back-translating
monolingual document-level data performs as well as much more elaborate
alternatives, both in terms of document-level metrics as well as human