Abstract: The segmentation of emails into functional zones (also dubbed email zoning)
is a relevant preprocessing step for most NLP tasks that deal with emails.
However, and despite the multilingual character of emails and their
applications, previous literature regarding email zoning corpora and systems
was developed essentially for English. In this paper, we analyse the existing
email zoning corpora and propose a new multilingual benchmark composed of 635
emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the
first multilingual email segmentation model based on a language-agnostic
sentence encoder. Besides generalizing well for unseen languages, our model is
competitive with current English benchmarks, and reached new state-of-the-art
performances for domain adaptation tasks in English.