Fugu-MT 論文翻訳(概要): IndicGuard: A Multilingual Safety Guard Model and Dataset for Indic Languages

論文の概要: IndicGuard: A Multilingual Safety Guard Model and Dataset for Indic Languages

arxiv url: http://arxiv.org/abs/2606.22841v1
Date: Mon, 22 Jun 2026 04:36:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 04:09:53.428175
Title: IndicGuard: A Multilingual Safety Guard Model and Dataset for Indic Languages
Title（参考訳）: IndicGuard: 言語の多言語安全ガードモデルとデータセット
Authors: Parth Bramhecha, Smit Deshmukh, Sairaj Bodhale, Adwait Borate, Raviraj Joshi,
Abstract要約: IndicGuardは、Indic言語のための多言語安全ガードモデルとデータセットである。我々は10の主要なインド語を含む高ボリュームで文化的にニュアンスのある安全データセットを構築した。 Gemma-3-4B-ITをベースとした4Bパラメータ命令調整モデルを微調整し,多言語安全ガードレールとして機能させる。
参考スコア（独自算出の注目度）: 2.584263027095689
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) achieve widespread integration across diverse linguistic landscapes, ensuring their safety and alignment with regional normative values remains a critical challenge. Current safety mechanisms are predominantly optimized for English-centric frameworks, often failing to capture the unique socio-cultural sensitivities and localized categories of harm inherent to the Indic region. To address this gap, we introduce IndicGuard, a multilingual safety guard model and dataset for Indic languages. We construct a high-volume, culturally nuanced safety dataset encompassing ten major Indic languages, systematically curated to capture regional harms, sensitive socio-political contexts, and adversarial jailbreaks. Leveraging this corpus, we fine-tune a 4B-parameter instruction-tuned model based on Gemma-3-4B-IT to serve as a multilingual safety guardrail for real-time content moderation and policy compliance checking. Our empirical evaluations demonstrate that IndicGuard significantly enhances LLM robustness against localized vulnerabilities, achieving high moderation consistency across different conversational turns. Crucially, IndicGuard consistently outperforms the existing baseline model, CultureGuard, across evaluated languages. Finally, we demonstrate that our model effectively generalizes to low-resource Indic languages excluded from training, substantiating the structural robustness and cross-lingual transfer capabilities of the framework.
Abstract（参考訳）: 大規模言語モデル(LLM)は多様な言語環境にまたがって広範な統合を実現しているため、その安全性と地域規範値との整合性を保証することは依然として重要な課題である。現在の安全メカニズムは、主に英語中心のフレームワークに最適化されており、しばしば、固有の社会文化的感受性と、インド地域固有の害の局所的なカテゴリーを捉えることに失敗している。このギャップに対処するために、多言語安全ガードモデルとIndic言語データセットであるIndicGuardを紹介します。我々は,10の主要言語を含む高量かつ文化的にニュアンスのある安全データセットを構築し,地域的危害,センシティブな社会的・政治的文脈,敵対的ジェイルブレイクを体系的にキュレートした。このコーパスを活用することで,Gemma-3-4B-ITをベースとした4Bパラメータの命令調整モデルを微調整し,リアルタイムコンテンツモデレーションとポリシーコンプライアンスチェックのための多言語安全ガードレールとして機能する。実験により, IndicGuard は局所脆弱性に対する LLM の堅牢性を著しく向上し,会話の異なるターン間で高いモデレーション整合性を実現することを示した。重要なことに、IndicGuardは既存のベースラインモデルであるCultureGuardを評価言語で一貫して上回っている。最後に,本モデルが学習から除外された低リソースのIndic言語に効果的に一般化し,フレームワークの構造的ロバスト性や言語間移動能力を実証することを示した。

論文の概要: IndicGuard: A Multilingual Safety Guard Model and Dataset for Indic Languages

関連論文リスト