Fugu-MT 論文翻訳(概要): Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin

論文の概要: Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin

arxiv url: http://arxiv.org/abs/2603.07286v1
Date: Sat, 07 Mar 2026 17:13:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.205193
Title: Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
Title（参考訳）: 台湾の安全ベンチマークとブリーズガード:台湾のマンダリンに信頼できるAIを目指す
Authors: Po-Chun Hsu, Meng-Hsi Chen, Tsu Ling Chao, Chia Tien Han, Da-shan Shiu,
Abstract要約: 台湾のマンダリンの安全性能を評価するための標準化された評価スイートTS-Bench(台湾安全ベンチマーク)を紹介する。 TS-Benchには、金融詐欺、医療の誤報、社会的差別、政治的操作など、重要な領域にまたがる400の人為的なプロンプトが含まれている。本稿では,Breeze 2から派生した8B安全モデルであるBreeze Guardについて紹介する。
参考スコア（独自算出の注目度）: 8.569205385775936
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Global safety models exhibit strong performance across widely used benchmarks, yet their training data rarely captures the cultural and linguistic nuances of Taiwanese Mandarin. This limitation results in systematic blind spots when interpreting region-specific risks such as localized financial scams, culturally embedded hate speech, and misinformation patterns. To address these gaps, we introduce TS-Bench (Taiwan Safety Benchmark), a standardized evaluation suite for assessing safety performance in Taiwanese Mandarin. TS-Bench contains 400 human-curated prompts spanning critical domains including financial fraud, medical misinformation, social discrimination, and political manipulation. In parallel, we present Breeze Guard, an 8B safety model derived from Breeze 2, our previously released general-purpose Taiwanese Mandarin LLM with strong cultural grounding from its original pre-training corpus. Breeze Guard is obtained through supervised fine-tuning on a large-scale, human-verified synthesized dataset targeting Taiwan-specific harms. Our central hypothesis is that effective safety detection requires the cultural grounding already present in the base model; safety fine-tuning alone is insufficient to introduce new socio linguistic knowledge from scratch. Empirically, Breeze Guard significantly outperforms the leading 8B general-purpose safety model, Granite Guardian 3.3, on TS-Bench (+0.17 overall F1), with particularly large gains in high-context categories such as scam (+0.66 F1) and financial malpractice (+0.43 F1). While the model shows slightly lower performance on English-centric benchmarks (ToxicChat, AegisSafetyTest), this tradeoff is expected for a regionally specialized safety model optimized for Taiwanese Mandarin. Together, Breeze Guard and TS-Bench establish a new foundation for trustworthy AI deployment in Taiwan.
Abstract（参考訳）: グローバル安全モデルは広く使用されているベンチマークで強い性能を示すが、そのトレーニングデータは台湾のマンダリンの文化的・言語的なニュアンスを捉えることは滅多にない。この制限は、局所的な金融詐欺、文化的に埋め込まれたヘイトスピーチ、誤情報パターンなどの地域固有のリスクを解釈する際に、系統的な盲点をもたらす。これらのギャップに対処するために,台湾のマンダリンの安全性能を評価するための標準化された評価スイートTS-Bench(台湾安全ベンチマーク)を紹介した。 TS-Benchには、金融詐欺、医療の誤報、社会的差別、政治的操作など、重要な領域にまたがる400の人為的なプロンプトが含まれている。本稿では,Breeze 2から派生した8B安全モデルであるBreeze Guardについて紹介する。ブレーズガードは、台湾固有の害を対象とする大規模で検証された人為的な合成データセットの教師付き微調整によって得られる。我々の中心的な仮説は、効果的な安全性検出には、既にベースモデルに存在する文化的な基盤が必要であり、安全性の微調整だけでは、ゼロから新しい社会言語知識を導入するには不十分である、というものである。ブレーゼガードは8Bの一般的な安全モデルであるグラナイト・ガーディアン3.3をTS-ベンチ(全体のF1は+0.17)で上回り、特に詐欺(+0.66 F1)や財政的不正(+0.43 F1)のような高文脈のカテゴリーで大きく上回っている。このモデルは、英語中心のベンチマーク(ToxicChat、AegisSafetyTest)では若干性能が低いが、台湾のマンダリン向けに最適化された地域に特化した安全モデルでは、このトレードオフが期待できる。 Breeze GuardとTS-Benchは共に、台湾における信頼できるAI展開のための新しい基盤を設立した。

論文の概要: Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin

関連論文リスト