Related papers: CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset

CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset

URL: http://arxiv.org/abs/2406.15481v1
Date: Mon, 17 Jun 2024 06:08:18 GMT
Title: CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset
Authors: Haneul Yoo, Yongjin Yang, Hwaran Lee,
Abstract summary: Code-switching red-teaming (CSRT) is a simple yet effective red-teaming technique that simultaneously tests multilingual understanding and safety of large language models (LLMs) We release the CSRT dataset, which comprises 315 code-switching queries combining up to 10 languages and eliciting a wide range of undesirable behaviors. We demonstrate that CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than existing methods in English.
Score: 10.154013836043816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies in large language models (LLMs) shed light on their multilingual ability and safety, beyond conventional tasks in language modeling. Still, current benchmarks reveal their inability to comprehensively evaluate them and are excessively dependent on manual annotations. In this paper, we introduce code-switching red-teaming (CSRT), a simple yet effective red-teaming technique that simultaneously tests multilingual understanding and safety of LLMs. We release the CSRT dataset, which comprises 315 code-switching queries combining up to 10 languages and eliciting a wide range of undesirable behaviors. Through extensive experiments with ten state-of-the-art LLMs, we demonstrate that CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than existing methods in English. We analyze the harmful responses toward the CSRT dataset concerning various aspects under ablation studies with 16K samples, including but not limited to scaling laws, unsafe behavior categories, and input conditions for optimal data generation. Additionally, we validate the extensibility of CSRT, by generating code-switching attack prompts with monolingual data.

Related papers

MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning [56.79292318645454]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking. This vulnerability is exacerbated in multilingual setting, where multilingual safety-aligned data are often limited. We propose an approach to build a multilingual guardrail with reasoning.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
Multi-lingual Multi-turn Automated Red Teaming for LLMs [4.707861373629172]
Multi-lingual Multi-turn Automated Red Teaming (textbfMM-ART) is a method to fully automate conversational, multi-lingual red-teaming operations. We show the studied LLMs are on average 71% more vulnerable after a 5-turn conversation in English than after the initial turn. For conversations in non-English languages, models display up to 195% more safety vulnerabilities than the standard single-turn English approach.
arXiv Detail & Related papers (2025-04-04T05:06:12Z)
Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback [11.223762031003671]
Code-mixing introduces unique challenges in daily life, such as syntactic mismatches and semantic blending. Large language models (LLMs) have revolutionized the field of natural language processing (NLP) by offering unprecedented capabilities in understanding human languages. We propose to improve the multilingual LLMs' ability to understand code-mixing through reinforcement learning from human feedback (RLHF) and code-mixed machine translation tasks.
arXiv Detail & Related papers (2024-11-13T22:56:00Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching [14.841981996951395]
Code-switching (CS) can convey subtle cultural and linguistic nuances that can be otherwise lost in translation. Recent state-of-the-art multilingual large language models (LLMs) demonstrate excellent multilingual abilities in various aspects including understanding CS.
arXiv Detail & Related papers (2024-10-24T05:14:03Z)
Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer [5.355430735475281]
This paper investigates the complexities of multilingual prompt-based code generation. Our evaluations reveal significant disparities in code quality for non-English prompts. We propose a zero-shot cross-lingual approach using a neural projection technique.
arXiv Detail & Related papers (2024-08-19T05:11:46Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs. We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files. Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language. Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z)
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields. In this work, we construct an open-source multilingual supervised fine-tuning dataset. The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z)
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts [46.089025223336854]
This paper examines the variations in safety challenges faced by large language models across different languages. We compare how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages.
arXiv Detail & Related papers (2024-01-23T23:12:09Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.