Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
- URL: http://arxiv.org/abs/2406.15481v3
- Date: Wed, 11 Jun 2025 05:32:09 GMT
- Title: Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
- Authors: Haneul Yoo, Yongjin Yang, Hwaran Lee,
- Abstract summary: Code-switching in red-teaming queries can effectively elicit undesirable behaviors of large language models (LLMs)<n>We introduce a simple yet effective framework, CSRT, to synthesize codeswitching red-teaming queries.<n>We demonstrate that the CSRT significantly outperforms existing multilingual red-teaming techniques.
- Score: 10.154013836043816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) have advanced rapidly, concerns regarding their safety have become prominent. In this paper, we discover that code-switching in red-teaming queries can effectively elicit undesirable behaviors of LLMs, which are common practices in natural language. We introduce a simple yet effective framework, CSRT, to synthesize codeswitching red-teaming queries and investigate the safety and multilingual understanding of LLMs comprehensively. Through extensive experiments with ten state-of-the-art LLMs and code-switching queries combining up to 10 languages, we demonstrate that the CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than standard attacks in English and being effective in conventional safety domains. We also examine the multilingual ability of those LLMs to generate and understand codeswitching texts. Additionally, we validate the extensibility of the CSRT by generating codeswitching attack prompts with monolingual data. We finally conduct detailed ablation studies exploring code-switching and propound unintended correlation between resource availability of languages and safety alignment in existing multilingual LLMs.
Related papers
- Can Large Language Models Understand, Reason About, and Generate Code-Switched Text? [26.210664542372168]
Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood.<n>We introduce CodeMixQA, a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants.<n>We analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs.
arXiv Detail & Related papers (2026-01-12T02:52:38Z) - Beyond Language Boundaries: Uncovering Programming Language Families for Code Language Models [8.711642038538876]
The rapid proliferation of programming languages presents both opportunities and challenges for developing multilingual code LLMs.<n>We propose an embedding-based framework to uncover the latent families of PLs.<n>This work offers a universal perspective on programming languages and advances more effective strategies for multilingual code LLM training.
arXiv Detail & Related papers (2025-12-22T16:04:56Z) - Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models [64.54005959758733]
We introduce code-switching in-context learning (CSICL) as a principled and robust approach for overcoming the translation barrier during inference.<n>We conduct extensive experiments across 4 LLMs, 6 datasets, and 10 languages, spanning both knowledge-intensive and reasoning-oriented domains.<n>Our results demonstrate CSICL consistently outperforms X-ICL baselines, achieving gains of 3.1%p and 1.9%p in both target and unseen languages.
arXiv Detail & Related papers (2025-10-07T08:35:42Z) - Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages [57.059267233093465]
Large Language Models (LLMs) have transformed natural language processing, but their safety mechanisms remain under-explored in low-resource, multilingual settings.<n>We introduce textsfSGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context.<n>We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails.
arXiv Detail & Related papers (2025-09-18T08:14:34Z) - Breaking Language Barriers: Equitable Performance in Multilingual Language Models [17.343456129678067]
LLMs perform worse in Common Sense Reasoning (CSR) tasks when prompted in low-resource languages (LRLs) like Hindi or Swahili compared to high-resource languages (HRLs) like English.<n>Our approach involves fine-tuning an LLM on synthetic code-switched text generated using controlled language-mixing methods.<n>We present a new dataset of synthetic code-switched text derived from the CommonSenseQA dataset, featuring three distinct language ratio configurations.
arXiv Detail & Related papers (2025-08-18T06:50:24Z) - MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning [56.79292318645454]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.
This vulnerability is exacerbated in multilingual setting, where multilingual safety-aligned data are often limited.
We propose an approach to build a multilingual guardrail with reasoning.
arXiv Detail & Related papers (2025-04-21T17:15:06Z) - Multi-lingual Multi-turn Automated Red Teaming for LLMs [4.707861373629172]
Multi-lingual Multi-turn Automated Red Teaming (textbfMM-ART) is a method to fully automate conversational, multi-lingual red-teaming operations.
We show the studied LLMs are on average 71% more vulnerable after a 5-turn conversation in English than after the initial turn.
For conversations in non-English languages, models display up to 195% more safety vulnerabilities than the standard single-turn English approach.
arXiv Detail & Related papers (2025-04-04T05:06:12Z) - Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback [11.223762031003671]
Code-mixing introduces unique challenges in daily life, such as syntactic mismatches and semantic blending.
Large language models (LLMs) have revolutionized the field of natural language processing (NLP) by offering unprecedented capabilities in understanding human languages.
We propose to improve the multilingual LLMs' ability to understand code-mixing through reinforcement learning from human feedback (RLHF) and code-mixed machine translation tasks.
arXiv Detail & Related papers (2024-11-13T22:56:00Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching [14.841981996951395]
Code-switching (CS) can convey subtle cultural and linguistic nuances that can be otherwise lost in translation.
Recent state-of-the-art multilingual large language models (LLMs) demonstrate excellent multilingual abilities in various aspects including understanding CS.
arXiv Detail & Related papers (2024-10-24T05:14:03Z) - Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer [5.355430735475281]
This paper investigates the complexities of multilingual prompt-based code generation.
Our evaluations reveal significant disparities in code quality for non-English prompts.
We propose a zero-shot cross-lingual approach using a neural projection technique.
arXiv Detail & Related papers (2024-08-19T05:11:46Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised
Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields.
In this work, we construct an open-source multilingual supervised fine-tuning dataset.
The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z) - The Language Barrier: Dissecting Safety Challenges of LLMs in
Multilingual Contexts [46.089025223336854]
This paper examines the variations in safety challenges faced by large language models across different languages.
We compare how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages.
arXiv Detail & Related papers (2024-01-23T23:12:09Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.