Towards Safe Multilingual Frontier AI
- URL: http://arxiv.org/abs/2409.13708v2
- Date: Tue, 29 Oct 2024 11:14:46 GMT
- Title: Towards Safe Multilingual Frontier AI
- Authors: Artūrs Kanepajs, Vladimir Ivanov, Richard Moulange,
- Abstract summary: multilingual jailbreaks undermine the safe and inclusive deployment of AI systems.
We propose policy actions that align with the EU legal landscape and institutional framework to address multilingual jailbreaks.
These include mandatory assessments of multilingual capabilities and vulnerabilities, public opinion research, and state support for multilingual AI development.
- Score: 0.18957478338649109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linguistically inclusive LLMs -- which maintain good performance regardless of the language with which they are prompted -- are necessary for the diffusion of AI benefits around the world. Multilingual jailbreaks that rely on language translation to evade safety measures undermine the safe and inclusive deployment of AI systems. We provide policy recommendations to enhance the multilingual capabilities of AI while mitigating the risks of multilingual jailbreaks. We examine how a language's level of resourcing relates to how vulnerable LLMs are to multilingual jailbreaks in that language. We do this by testing five advanced AI models across 24 official languages of the EU. Building on prior research, we propose policy actions that align with the EU legal landscape and institutional framework to address multilingual jailbreaks, while promoting linguistic inclusivity. These include mandatory assessments of multilingual capabilities and vulnerabilities, public opinion research, and state support for multilingual AI development. The measures aim to improve AI safety and functionality through EU policy initiatives, guiding the implementation of the EU AI Act and informing regulatory efforts of the European AI Office.
Related papers
- MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning [56.79292318645454]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.
This vulnerability is exacerbated in multilingual setting, where multilingual safety-aligned data are often limited.
We propose an approach to build a multilingual guardrail with reasoning.
arXiv Detail & Related papers (2025-04-21T17:15:06Z) - X-Guard: Multilingual Guard Agent for Content Moderation [8.233872344445675]
X-Guard is a transparent multilingual safety agent designed to provide content moderation across diverse linguistic contexts.
Our approach includes curating and enhancing multiple open-source safety datasets with explicit evaluation rationales.
Our empirical evaluations demonstrate X-Guard's effectiveness in detecting unsafe content across multiple languages.
arXiv Detail & Related papers (2025-04-11T01:58:06Z) - Benchmarking LLM Guardrails in Handling Multilingual Toxicity [57.296161186129545]
We introduce a comprehensive multilingual test suite, spanning seven datasets and over ten languages, to benchmark the performance of state-of-the-art guardrails.
We investigate the resilience of guardrails against recent jailbreaking techniques, and assess the impact of in-context safety policies and language resource availability on guardrails' performance.
Our findings show that existing guardrails are still ineffective at handling multilingual toxicity and lack robustness against jailbreaking prompts.
arXiv Detail & Related papers (2024-10-29T15:51:24Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - LLM for Everyone: Representing the Underrepresented in Large Language Models [21.07409393578553]
This thesis aims to bridge the gap in NLP research and development by focusing on underrepresented languages.
A comprehensive evaluation of large language models (LLMs) is conducted to assess their capabilities in these languages.
The proposed solutions cover cross-lingual continual instruction tuning, retrieval-based cross-lingual in-context learning, and in-context query alignment.
arXiv Detail & Related papers (2024-09-20T20:53:22Z) - Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture [6.17896401271963]
We introduce Multilingual Blending, a mixed-language query-response scheme designed to evaluate the safety alignment of various large language models.
We investigate language patterns such as language availability, morphology, and language family that could impact the effectiveness of Multilingual Blending.
arXiv Detail & Related papers (2024-07-10T03:26:15Z) - Safe Multi-agent Reinforcement Learning with Natural Language Constraints [49.01100552946231]
The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked.
We propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL)
Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings.
These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards.
arXiv Detail & Related papers (2024-05-30T12:57:35Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Multilingual Jailbreak Challenges in Large Language Models [96.74878032417054]
In this study, we reveal the presence of multilingual jailbreak challenges within large language models (LLMs)
We consider two potential risky scenarios: unintentional and intentional.
We propose a novel textscSelf-Defense framework that automatically generates multilingual training data for safety fine-tuning.
arXiv Detail & Related papers (2023-10-10T09:44:06Z) - Low-Resource Languages Jailbreak GPT-4 [19.97929171158234]
Our work exposes the inherent cross-lingual vulnerability of AI safety training and red-teaming of large language models (LLMs)
On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time.
Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages.
arXiv Detail & Related papers (2023-10-03T21:30:56Z) - All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs)
XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families.
We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.