Exploring Methods for Cross-lingual Text Style Transfer: The Case of
Text Detoxification
- URL: http://arxiv.org/abs/2311.13937v1
- Date: Thu, 23 Nov 2023 11:40:28 GMT
- Title: Exploring Methods for Cross-lingual Text Style Transfer: The Case of
Text Detoxification
- Authors: Daryna Dementieva, Daniil Moskovskiy, David Dale and Alexander
Panchenko
- Abstract summary: Text detoxification is the task of transferring the style of text from toxic to neutral.
We present a large-scale study of strategies for cross-lingual text detoxification.
- Score: 77.45995868988301
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text detoxification is the task of transferring the style of text from toxic
to neutral. While here are approaches yielding promising results in monolingual
setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer
for this task remains a challenging open problem (Moskovskiy et al., 2022). In
this work, we present a large-scale study of strategies for cross-lingual text
detoxification -- given a parallel detoxification corpus for one language; the
goal is to transfer detoxification ability to another language for which we do
not have such a corpus. Moreover, we are the first to explore a new task where
text translation and detoxification are performed simultaneously, providing
several strong baselines for this task. Finally, we introduce new automatic
detoxification evaluation metrics with higher correlations with human judgments
than previous benchmarks. We assess the most promising approaches also with
manual markup, determining the answer for the best strategy to transfer the
knowledge of text detoxification between languages.
Related papers
- SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification [41.94295877935867]
This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the SmurfCat team.
Using data augmentation through machine translation and a special filtering procedure, we collected an additional multilingual parallel dataset for text detoxification.
We fine-tuned several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task.
arXiv Detail & Related papers (2024-07-07T17:19:34Z) - MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages [71.50809576484288]
Text detoxification is a task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register.
Recent approaches for parallel text detoxification corpora collection -- ParaDetox and APPADIA -- were explored only in monolingual setup.
In this work, we aim to extend ParaDetox pipeline to multiple languages presenting MultiParaDetox to automate parallel detoxification corpus collection for potentially any language.
arXiv Detail & Related papers (2024-04-02T15:32:32Z) - Text Detoxification as Style Transfer in English and Hindi [1.183205689022649]
This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text.
We present three approaches: knowledge transfer from a similar task, multi-task learning approach, and delete and reconstruct approach.
Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
arXiv Detail & Related papers (2024-02-12T16:30:41Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Exploring Cross-lingual Textual Style Transfer with Large Multilingual
Language Models [78.12943085697283]
Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text.
This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting.
arXiv Detail & Related papers (2022-06-05T20:02:30Z) - Methods for Detoxification of Texts for the Russian Language [55.337471467610094]
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.
We test two types of models - unsupervised approach that performs local corrections and supervised approach based on pretrained language GPT-2 model.
The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
arXiv Detail & Related papers (2021-05-19T10:37:44Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.