Related papers: Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

URL: http://arxiv.org/abs/2305.04446v1
Date: Mon, 8 May 2023 03:50:38 GMT
Title: Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
Authors: Junyu Lu, Bo Xu, Xiaokun Zhang, Changrong Min, Liang Yang, Hongfei Lin
Abstract summary: Existing datasets lack fine-grained annotation of toxic types and expressions. It is crucial to introduce lexical knowledge to detect the toxicity of posts. In this paper, we facilitate the fine-grained detection of Chinese toxic language.
Score: 18.44630180661091
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we built Monitor Toxic Frame, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset ToxiCN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

Related papers

Aligned Probing: Relating Toxic Behavior and Model Internals [66.49887503194101]
We introduce aligned probing, a novel interpretability framework that aligns the behavior of language models (LMs) Using this framework, we examine over 20 OLMo, Llama, and Mistral models, bridging behavioral and internal perspectives for toxicity for the first time. Our results show that LMs strongly encode information about the toxicity level of inputs and subsequent outputs, particularly in lower layers.
arXiv Detail & Related papers (2025-03-17T17:23:50Z)
Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph [36.07351851458233]
The absence of domain-specific toxic knowledge leads to false negatives. The excessive sensitivity of Large Language Models to toxic speech results in false positives. We propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection.
arXiv Detail & Related papers (2024-12-17T06:28:28Z)
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts [13.470734853274587]
Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language. We create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity.
arXiv Detail & Related papers (2024-06-25T14:02:11Z)
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models [27.996123856250065]
Existing toxicity benchmarks are overwhelmingly focused on English. We introduce PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages.
arXiv Detail & Related papers (2024-05-15T14:22:33Z)
Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric [16.423707276483178]
We introduce a robust metric grounded on Large Language Models (LLMs) to flexibly measure toxicity according to the given definition. Our results demonstrate outstanding performance in measuring toxicity within verified factors, improving on conventional metrics by 12 points in the F1 score.
arXiv Detail & Related papers (2024-02-10T07:55:27Z)
Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z)
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts [57.38912708076231]
We introduce MaRCo, a detoxification algorithm that combines controllable generation and text rewriting methods. MaRCo uses likelihoods under a non-toxic LM and a toxic LM to find candidate words to mask and potentially replace. We evaluate our method on several subtle toxicity and microaggressions datasets, and show that it not only outperforms baselines on automatic metrics, but MaRCo's rewrites are preferred 2.1 $times$ more in human evaluation.
arXiv Detail & Related papers (2022-12-20T18:50:00Z)
Toxicity in Multilingual Machine Translation at Scale [3.4620477930009472]
We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demographic axes) from English into 164 languages. An automatic toxicity evaluation shows that added toxicity across languages varies from 0% to 5%. The output languages with the most added toxicity tend to be low-resource ones, and the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability.
arXiv Detail & Related papers (2022-10-06T17:26:27Z)
Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels. We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language. We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.