Russian Texts Detoxification with Levenshtein Editing
- URL: http://arxiv.org/abs/2204.13638v1
- Date: Thu, 28 Apr 2022 16:58:17 GMT
- Title: Russian Texts Detoxification with Levenshtein Editing
- Authors: Ilya Gusev
- Abstract summary: We build a two-step tagging-based detoxification model using a parallel corpus of Russian texts.
We achieve the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text detoxification is a style transfer task of creating neutral versions of
toxic texts. In this paper, we use the concept of text editing to build a
two-step tagging-based detoxification model using a parallel corpus of Russian
texts. With this model, we achieved the best style transfer accuracy among all
models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence
models.
Related papers
- MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages [71.50809576484288]
Text detoxification is a task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register.
Recent approaches for parallel text detoxification corpora collection -- ParaDetox and APPADIA -- were explored only in monolingual setup.
In this work, we aim to extend ParaDetox pipeline to multiple languages presenting MultiParaDetox to automate parallel detoxification corpus collection for potentially any language.
arXiv Detail & Related papers (2024-04-02T15:32:32Z) - Text Detoxification as Style Transfer in English and Hindi [1.183205689022649]
This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text.
We present three approaches: knowledge transfer from a similar task, multi-task learning approach, and delete and reconstruct approach.
Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
arXiv Detail & Related papers (2024-02-12T16:30:41Z) - Exploring Methods for Cross-lingual Text Style Transfer: The Case of
Text Detoxification [77.45995868988301]
Text detoxification is the task of transferring the style of text from toxic to neutral.
We present a large-scale study of strategies for cross-lingual text detoxification.
arXiv Detail & Related papers (2023-11-23T11:40:28Z) - DiffuDetox: A Mixed Diffusion Model for Text Detoxification [12.014080113339178]
Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text.
We propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification.
arXiv Detail & Related papers (2023-06-14T13:41:23Z) - Text Generation with Text-Editing Models [78.03750739936956]
This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches.
We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias.
arXiv Detail & Related papers (2022-06-14T17:58:17Z) - Text Detoxification using Large Pre-trained Neural Models [57.72086777177844]
We present two novel unsupervised methods for eliminating toxicity in text.
First method combines guidance of the generation process with small style-conditional language models.
Second method uses BERT to replace toxic words with their non-offensive synonyms.
arXiv Detail & Related papers (2021-09-18T11:55:32Z) - Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries.
We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z) - Methods for Detoxification of Texts for the Russian Language [55.337471467610094]
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.
We test two types of models - unsupervised approach that performs local corrections and supervised approach based on pretrained language GPT-2 model.
The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
arXiv Detail & Related papers (2021-05-19T10:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.