Fugu-MT 論文翻訳(概要): Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

論文の概要: Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

arxiv url: http://arxiv.org/abs/2604.00536v1
Date: Wed, 01 Apr 2026 06:28:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.870553
Title: Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
Title（参考訳）: Optimsyn: インフルエンスガイドによる合成データ生成の最適化
Authors: Zhiting Fan, Ruizhe Chen, Tianxiang Hu, Ru Peng, Zenan Huang, Haokai Xu, Yixin Chen, Jian Wu, Junbo Zhao, Zuozhu Liu,
Abstract要約: 大規模言語モデル(LLM)は、多くの教師付き微調整(SFT)データにより、強力な下流性能を達成する。人文科学、社会科学、医学、法律、金融といった知識集約的な領域における高品質なSFTデータはほとんどない。しかし、ルーブリック設計は専門家に依存しており、ドメイン間での転送が不十分であり、しばしば、ルーブリックの記述、データの合成、トレーニング、結果の検査、手動による修正の推測といった脆いループを通じて最適化される。
参考スコア（独自算出の注目度）: 41.42036786553015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) achieve strong downstream performance largely due to abundant supervised fine-tuning (SFT) data. However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance is scarce because expert curation is expensive, privacy constraints are strict, and label consistency is hard to ensure. Recent work uses synthetic data, typically by prompting a generator over domain documents and filtering outputs with handcrafted rubrics. Yet rubric design is expert-dependent, transfers poorly across domains, and is often optimized through a brittle heuristic loop of writing rubrics, synthesizing data, training, inspecting results, and manually guessing revisions. This process lacks reliable quantitative feedback about how a rubric affects downstream performance. We propose evaluating synthetic data by its training utility on the target model and using this signal to guide data generation. Inspired by influence estimation, we adopt an optimizer-aware estimator that uses gradient information to quantify each synthetic sample's contribution to a target model's objective on specific tasks. Our analysis shows that even when synthetic and real samples are close in embedding space, their influence on learning can differ substantially. Based on this insight, we propose an optimization-based framework that adapts rubrics using target-model feedback. We provide lightweight guiding text and use a rubric-specialized model to generate task-conditioned rubrics. Influence score is used as the reward to optimize the rubric generator with reinforcement learning. Experiments across domains, target models, and data generators show consistent improvements and strong generalization without task-specific tuning.
Abstract（参考訳）: 大規模言語モデル(LLM)は、多くの教師付き微調整(SFT)データにより、強力な下流性能を達成する。しかし、人文科学、社会科学、医学、法学、金融といった知識集約的な領域における高品質なSFTデータは、専門家のキュレーションが高価であり、プライバシーの制約が厳しく、ラベルの一貫性の確保が難しいため、不足している。最近の研究は、通常、ドメイン文書上にジェネレータを誘導し、手作りのルーリックで出力をフィルタリングすることで合成データを使用する。しかし、ルーブリック設計は専門家に依存しており、ドメイン間での転送が不十分であり、しばしば、ルーブリックの記述、データの合成、トレーニング、結果の検査、手動による修正の推測といった脆弱なヒューリスティックループを通じて最適化される。このプロセスは、ルーブリックが下流のパフォーマンスにどのように影響するかについて、信頼できる定量的フィードバックを欠いている。対象モデル上でのトレーニングユーティリティによる合成データの評価と,この信号によるデータ生成の誘導を提案する。影響推定に着想を得て,各合成試料の目的に対する特定のタスクに対する貢献度を定量化するために,勾配情報を利用する最適化器対応推定器を採用した。分析の結果, 合成サンプルと実サンプルが埋め込み空間に近接している場合でも, 学習への影響は大きく異なることがわかった。この知見に基づいて,対象モデルフィードバックを用いてルーブリックを適応する最適化ベースのフレームワークを提案する。我々は、軽量な案内テキストを提供し、タスク条件付きルーリックを生成するために、ルーリック特化モデルを使用する。インフルエンススコアは、強化学習によるルーリックジェネレータの最適化の報酬として使用される。ドメイン、ターゲットモデル、データジェネレータにわたる実験は、タスク固有のチューニングなしで一貫した改善と強力な一般化を示している。

論文の概要: Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

関連論文リスト