Related papers: Text Detoxification as Style Transfer in English and Hindi

Text Detoxification as Style Transfer in English and Hindi

URL: http://arxiv.org/abs/2402.07767v2
Date: Sun, 9 Jun 2024 18:48:06 GMT
Title: Text Detoxification as Style Transfer in English and Hindi
Authors: Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek,
Abstract summary: This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, and delete and reconstruct approach. Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
Score: 1.183205689022649
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combining sequence-to-sequence modeling with various toxicity classification tasks, and delete and reconstruct approach. To support our research, we utilize a dataset provided by Dementieva et al.(2021), which contains multiple versions of detoxified texts corresponding to toxic texts. In our experiments, we selected the best variants through expert human annotators, creating a dataset where each toxic sentence is paired with a single, appropriate detoxified version. Additionally, we introduced a small Hindi parallel dataset, aligning with a part of the English dataset, suitable for evaluation purposes. Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.

Related papers

Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification [66.69370876902222]
We perform the first comprehensive multilingual study on evaluation of text detoxification system across nine languages.<n>We assess the effectiveness of modern neural-based evaluation models alongside prompting-based LLM-as-a-judge approaches.<n>Our findings provide a practical recipe for designing more reliable multilingual TST evaluation pipeline.
arXiv Detail & Related papers (2025-07-21T12:38:07Z)
Multilingual and Explainable Text Detoxification with Parallel Corpora [58.83211571400692]
We extend parallel text detoxification corpus to new languages. We conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences. We then experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach.
arXiv Detail & Related papers (2024-12-16T12:08:59Z)
Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification [77.45995868988301]
Text detoxification is the task of transferring the style of text from toxic to neutral. We present a large-scale study of strategies for cross-lingual text detoxification.
arXiv Detail & Related papers (2023-11-23T11:40:28Z)
DiffuDetox: A Mixed Diffusion Model for Text Detoxification [12.014080113339178]
Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. We propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification.
arXiv Detail & Related papers (2023-06-14T13:41:23Z)
Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain [53.22419717434372]
We propose a new task, namely stylized data-to-text generation, whose aim is to generate coherent text according to a specific style. This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples. We propose a novel stylized data-to-text generation model, named StyleD2T, comprising three components: logic planning-enhanced data embedding, mask-based style embedding, and unbiased stylized text generation.
arXiv Detail & Related papers (2023-05-05T03:02:41Z)
Russian Texts Detoxification with Levenshtein Editing [0.0]
We build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. We achieve the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
arXiv Detail & Related papers (2022-04-28T16:58:17Z)
A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications [0.02578242050187029]
This paper presents two datasets comprised of artificially generated research content. In the first case, the content is completely generated by the GPT-2 model after a short prompt extracted from original papers. The partial or hybrid dataset is created by replacing several sentences of abstracts with sentences that are generated by the Arxiv-NLP model. We evaluate the quality of the datasets comparing the generated texts to aligned original texts using fluency metrics such as BLEU and ROUGE.
arXiv Detail & Related papers (2022-02-04T08:16:56Z)
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging. Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal. Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss. To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP) In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.