Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites
- URL: http://arxiv.org/abs/2505.15297v1
- Date: Wed, 21 May 2025 09:27:18 GMT
- Title: Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites
- Authors: Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, Chris Biemann,
- Abstract summary: ToxiRewriteCN is the first Chinese dataset explicitly designed to preserve sentiment polarity.<n>It comprises 1,556 carefully annotated triplets, each containing a toxic sentence, a sentiment-aligned non-toxic rewrite, and labeled toxic spans.<n>It covers five real-world scenarios: standard expressions, emoji-induced and homophonic toxicity, as well as single-turn and multi-turn dialogues.
- Score: 39.3555146467512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detoxifying offensive language while preserving the speaker's original intent is a challenging yet critical goal for improving the quality of online interactions. Although large language models (LLMs) show promise in rewriting toxic content, they often default to overly polite rewrites, distorting the emotional tone and communicative intent. This problem is especially acute in Chinese, where toxicity often arises implicitly through emojis, homophones, or discourse context. We present ToxiRewriteCN, the first Chinese detoxification dataset explicitly designed to preserve sentiment polarity. The dataset comprises 1,556 carefully annotated triplets, each containing a toxic sentence, a sentiment-aligned non-toxic rewrite, and labeled toxic spans. It covers five real-world scenarios: standard expressions, emoji-induced and homophonic toxicity, as well as single-turn and multi-turn dialogues. We evaluate 17 LLMs, including commercial and open-source models with variant architectures, across four dimensions: detoxification accuracy, fluency, content preservation, and sentiment polarity. Results show that while commercial and MoE models perform best overall, all models struggle to balance safety with emotional fidelity in more subtle or context-heavy settings such as emoji, homophone, and dialogue-based inputs. We release ToxiRewriteCN to support future research on controllable, sentiment-aware detoxification for Chinese.
Related papers
- Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon [10.538492229433409]
Social media platforms have experienced a significant rise in toxic content, including abusive language and discriminatory remarks.<n>Existing methods are mostly designed for English texts, while Chinese cloaked toxicity unveiling has not been solved yet.<n>We propose C$2$TU, a novel training-free and prompt-free method for Chinese cloaked toxic content unveiling.
arXiv Detail & Related papers (2025-05-28T09:58:15Z) - ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality [35.517662288248225]
ToxicTone is the largest public dataset of its kind.<n>Our data is sourced from diverse real-world audio and organized into 13 topical categories.<n>We propose a multimodal detection framework that integrates acoustic, linguistic, and emotional features.
arXiv Detail & Related papers (2025-05-21T17:25:27Z) - Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language [10.295731340480417]
Pragmatic Inference Chain (PIC) is a new prompting method drawn on interdisciplinary findings from cognitive science and linguistics.<n>It significantly improves the success rate of GPT-4o, Llama-3.1-70B-Instruct, and DeepSeek-v2.5 in identifying implicit toxic language.
arXiv Detail & Related papers (2025-03-03T13:51:05Z) - Multilingual and Explainable Text Detoxification with Parallel Corpora [58.83211571400692]
We extend parallel text detoxification corpus to new languages.<n>We conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences.<n>We then experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach.
arXiv Detail & Related papers (2024-12-16T12:08:59Z) - Text Detoxification as Style Transfer in English and Hindi [1.183205689022649]
This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text.
We present three approaches: knowledge transfer from a similar task, multi-task learning approach, and delete and reconstruct approach.
Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
arXiv Detail & Related papers (2024-02-12T16:30:41Z) - Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [63.32199372362483]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.<n>In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.<n>We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z) - Seamless: Multilingual Expressive and Streaming Speech Translation [71.12826355107889]
We introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.
First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model- SeamlessM4T v2.
We bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time.
arXiv Detail & Related papers (2023-12-08T17:18:42Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Speech Toxicity Analysis: A New Spoken Language Processing Task [32.297717021285344]
Toxic speech, also known as hate speech, is regarded as one of the crucial issues plaguing online social media today.
We propose a new Spoken Language Processing task of detecting toxicity from spoken speech.
We introduce DeToxy, the first publicly available toxicity annotated dataset for English speech, sourced from various openly available speech databases.
arXiv Detail & Related papers (2021-10-14T17:51:04Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.