Related papers: Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

URL: http://arxiv.org/abs/2505.24341v1
Date: Fri, 30 May 2025 08:32:45 GMT
Title: Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
Authors: Shujian Yang, Shiyao Cui, Chuanrui Hu, Haicheng Wang, Tianwei Zhang, Minlie Huang, Jialiang Lu, Han Qiu,
Abstract summary: We highlight the multimodal nature of Chinese language as a key challenge for deploying language models in toxic Chinese detection.<n>First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content.<n>Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text.
Score: 48.841514684592426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting toxic content using language models is important but challenging. While large language models (LLMs) have demonstrated strong performance in understanding Chinese, recent studies show that simple character substitutions in toxic Chinese text can easily confuse the state-of-the-art (SOTA) LLMs. In this paper, we highlight the multimodal nature of Chinese language as a key challenge for deploying LLMs in toxic Chinese detection. First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content. Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text. Additionally, we explore cost-effective enhancement solutions like in-context learning (ICL) and supervised fine-tuning (SFT). Our results reveal two important findings. (1) LLMs are less capable of detecting perturbed multimodal Chinese toxic contents. (2) ICL or SFT with a small number of perturbed examples may cause the LLMs "overcorrect'': misidentify many normal Chinese contents as toxic.

Related papers

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark [50.89916747049978]
Existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope.<n>We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data.<n>We propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs.
arXiv Detail & Related papers (2025-06-12T17:57:05Z)
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese [52.98034458924209]
This study investigates whether Large Language Models exhibit differential performance when prompted in two variants of written Chinese.<n>We design two benchmark tasks that reflect real-world scenarios: regional term choice and regional name choice.<n>Our analyses indicate that biases in LLM responses are dependent on both the task and prompting language.
arXiv Detail & Related papers (2025-05-28T17:56:49Z)
Dialectal Toxicity Detection: Evaluating LLM-as-a-Judge Consistency Across Language Varieties [23.777874316083984]
There has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. We create a multi-dialect dataset through synthetic transformations and human-assisted translations, covering 10 language clusters and 60 varieties. We then evaluated three LLMs on their ability to assess toxicity across multilingual, dialectal, and LLM-human consistency.
arXiv Detail & Related papers (2024-11-17T03:53:24Z)
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models [13.911977148887873]
We present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models.<n>Our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues.<n>For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words.
arXiv Detail & Related papers (2024-10-24T07:25:29Z)
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models [27.996123856250065]
Existing toxicity benchmarks are overwhelmingly focused on English. We introduce PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages.
arXiv Detail & Related papers (2024-05-15T14:22:33Z)
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model [36.01840141194335]
We introduce CT-LLM, a 2B large language model (LLM) Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by incorporating Chinese textual data. CT-LLM excels in Chinese language tasks, and showcases its adeptness in English through SFT.
arXiv Detail & Related papers (2024-04-05T15:20:02Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models [53.9835961434552]
We introduce the Chinese Instruction-Following Benchmark (CIF-Bench) to evaluate the generalizability of large language models (LLMs) to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance.
arXiv Detail & Related papers (2024-02-20T16:02:12Z)
On the (In)Effectiveness of Large Language Models for Chinese Text Correction [44.32102000125604]
Large Language Models (LLMs) have amazed the entire Artificial Intelligence community. This study focuses on Chinese Text Correction, a fundamental and challenging Chinese NLP task. We empirically find that the LLMs currently have both amazing performance and unsatisfactory behavior for Chinese Text Correction.
arXiv Detail & Related papers (2023-07-18T06:48:52Z)
COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences. We also propose textscCOLDetector to study output offensiveness of popular Chinese language models. Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.