Fugu-MT 論文翻訳(概要): ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

論文の概要: ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

arxiv url: http://arxiv.org/abs/2605.00689v1
Date: Fri, 01 May 2026 14:24:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.983096
Title: ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
Title（参考訳）: ML-Bench&Guard: 大規模言語モデルのためのポリシー付き多言語安全ベンチマークとガードレール
Authors: Yunhan Zhao, Zhaorun Chen, Xingjun Ma, Yu-Gang Jiang, Bo Li,
Abstract要約: ML-Benchはポリシーベースで14の言語をカバーする多言語安全ベンチマークである。 ML-Bench上に構築したML-Guardは多言語安全判断とポリシー条件付きコンプライアンスアセスメントをサポートするガードレールモデルである。
参考スコア（独自算出の注目度）: 69.0361356103553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) are increasingly deployed in cross-linguistic contexts, ensuring safety in diverse regulatory and cultural environments has become a critical challenge. However, existing multilingual benchmarks largely rely on general risk taxonomies and machine translation, which confines guardrail models to these predefined categories and hinders their ability to align with region-specific regulations and cultural nuances. To bridge these gaps, we introduce ML-Bench, a policy-grounded multilingual safety benchmark covering 14 languages. ML-Bench is constructed directly from regional regulations, where risk categories and fine-grained rules derived from jurisdiction-specific legal texts are directly used to guide the generation of multilingual safety data, enabling culturally and legally aligned evaluation across languages. Building on ML-Bench, we develop ML-Guard, a Diffusion Large Language Model (dLLM)-based guardrail model that supports multilingual safety judgment and policy-conditioned compliance assessment. ML-Guard has two variants, one 1.5B lightweight model for fast `safe/unsafe' checking and a more capable 7B model for customized compliance checking with detailed explanations. We conduct extensive experiments against 11 strong guardrail baselines across 6 existing multilingual safety benchmarks and our ML-Bench, and show that ML-Guard consistently outperforms prior methods. We hope that ML-Bench and ML-Guard can help advance the development of regulation-aware and culturally aligned multilingual guardrail systems.
Abstract（参考訳）: 大きな言語モデル(LLM)が言語横断的な文脈にますます導入されるにつれて、多様な規制や文化環境における安全性の確保が重要な課題となっている。しかし、既存の多言語ベンチマークは一般的なリスク分類と機械翻訳に大きく依存しており、これらの事前定義されたカテゴリーにガードレールモデルを限定し、地域固有の規制や文化的ニュアンスに適合する能力を妨げている。これらのギャップを埋めるため、ポリシー付き多言語安全ベンチマークであるML-Benchを紹介した。 ML-Benchは、地域規制から直接構築されており、法域固有の法文から派生したリスクカテゴリときめ細かい規則を直接使用して、多言語安全データの生成を誘導し、言語間で文化的かつ法的に整合した評価を可能にする。 ML-Bench上に構築したML-Guardは,多言語安全判断とポリシー条件付きコンプライアンスアセスメントをサポートする拡散大言語モデル(dLLM)に基づくガードレールモデルである。 ML-Guardには、高速な‘safe/unsafe’チェックのための1.5Bライトウェイトモデルと、詳細な説明を伴うコンプライアンスチェックをカスタマイズするためのより有能な7Bモデルという、2つのバリエーションがある。既存の6つのマルチリンガル安全ベンチマークとML-Benchに対して,11の強いガードレールベースラインに対して広範な実験を行い,ML-Guardが従来手法より一貫して優れていることを示す。我々は,ML-BenchとML-Guardが,規制意識と文化的に整合した多言語ガードレールシステムの開発を促進することを期待する。

論文の概要: ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

関連論文リスト