Fugu-MT 論文翻訳(概要): The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

論文の概要: The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

arxiv url: http://arxiv.org/abs/2510.19806v1
Date: Wed, 22 Oct 2025 17:41:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.24007
Title: The Art of Asking: Multilingual Prompt Optimization for Synthetic Data
Title（参考訳）: The Art of Asking: Multilingual Prompt Optimization for Synthetic Data
Authors: David Mora, Viraat Aryabumi, Wei-Yin Ko, Sara Hooker, Julia Kreutzer, Marzieh Fadaee,
Abstract要約: 過度に見過ごされる空間-トレーニング分布を定義する入力は、多言語性能を改善するためのより強力なレバーである、と我々は主張する。本稿では,自然性,文化適応,難易度向上のために,翻訳されたプロンプトを体系的に変換する,プロンプト空間最適化のための軽量なフレームワークを提案する。
参考スコア（独自算出の注目度）: 25.82527211292218
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Synthetic data has become a cornerstone for scaling large language models, yet its multilingual use remains bottlenecked by translation-based prompts. This strategy inherits English-centric framing and style and neglects cultural dimensions, ultimately constraining model generalization. We argue that the overlooked prompt space-the very inputs that define training distributions-offers a more powerful lever for improving multilingual performance. We introduce a lightweight framework for prompt-space optimization, where translated prompts are systematically transformed for Naturalness, Cultural Adaptation, and Difficulty Enhancement. Using an off-the-shelf multilingual LLM, we apply these transformations to prompts for 12 languages spanning 7 families. Under identical data conditions, our approaches achieve substantial and consistent downstream improvements over the translation-only baseline: +4.7% on Global-MMLU accuracy, +2.4% on Flores XCometXL and +35.3% wins in preferences on mArenaHard. We establish prompt-space optimization as a simple yet powerful paradigm for building multilingual LLMs that are more robust, culturally grounded, and globally capable.
Abstract（参考訳）: 合成データは、大規模な言語モデルをスケールするための基盤となっているが、その多言語使用は、翻訳ベースのプロンプトによってボトルネックになっている。この戦略は、英語中心のフレーミングとスタイルを継承し、最終的にモデルの一般化を制約する文化的な側面を無視する。過度に見過ごされる空間-トレーニング分布を定義する入力は、多言語性能を改善するためのより強力なレバーである、と我々は主張する。本稿では,自然性,文化適応,難易度向上のために,翻訳されたプロンプトを体系的に変換する,プロンプト空間最適化のための軽量なフレームワークを提案する。既製の多言語LLMを用いて、7つのファミリーにまたがる12言語のプロンプトにこれらの変換を適用する。同一のデータ条件下では、我々のアプローチは翻訳のみのベースラインに対して、実質的で一貫したダウンストリームの改善を実現している:+4.7%はGlobal-MMLUの精度、+2.4%はFlores XCometXL、+35.3%はmArenaHardの好みで勝利する。我々は,より堅牢で文化的基盤があり,グローバルに機能する多言語LLMを構築するための,シンプルかつ強力なパラダイムとして,プロンプト空間最適化を確立する。

論文の概要: The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

関連論文リスト