Fugu-MT 論文翻訳(概要): CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency

論文の概要: CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency

arxiv url: http://arxiv.org/abs/2508.16100v1
Date: Fri, 22 Aug 2025 05:30:59 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.256761
Title: CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
Title（参考訳）: CYCLE-INSTRUCT:Dual Self-TrainingとCycle Consistencyによるフルシードフリーインストラクションチューニング
Authors: Zhanming Shen, Hao Chen, Yulei Tang, Shaolin Zhu, Wentao Ye, Xiaomeng Hu, Haobo Wang, Gang Chen, Junbo Zhao,
Abstract要約: Cycle-Instructは、完全に種なしの命令チューニングを実現する新しいフレームワークである。サイクル一貫性にインスパイアされたCycle-Instructは、2つのモデル-答え生成器と質問生成器-は、生のラベルなしテキストのみからブートストラップされる二重自己学習ループを採用している。実験の結果,Cycle-Instructはシード駆動のバックトランスレーションベースラインを上回るだけでなく,強い教師付き手法に匹敵する性能が得られることがわかった。
参考スコア（独自算出の注目度）: 31.636970128351454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction tuning is vital for aligning large language models (LLMs) with human intent, but current methods typically rely on costly human-annotated seed data or powerful external teacher models. While instruction back-translation techniques reduce this dependency, they remain fundamentally tethered to an initial seed set, which limits full automation, introduces biases, and can lead to inefficient use of unlabeled corpora. In this paper, we propose Cycle-Instruct, a novel framework that achieves fully seed-free instruction tuning. Inspired by cycle consistency, Cycle-Instruct employs a dual self-training loop where two models-an answer generator and a question generator-are bootstrapped solely from raw, unlabeled text. These models mutually supervise each other by reconstructing original text segments from their counterpart's generated pseudo-labels, effectively learning from the intrinsic structure of the data without any human-provided seeds. We demonstrate Cycle-Instruct's efficacy across four diverse data tracks, including general instruction-following, domain-specific tasks, dialogue logs, and plain text. Our extensive experiments show that Cycle-Instruct not only outperforms seed-driven back-translation baselines but also achieves performance comparable to strongly supervised methods.
Abstract（参考訳）: インストラクションチューニングは、大きな言語モデル(LLM)を人間の意図と整合させるのに不可欠であるが、現在の手法は通常、コストのかかる人為的なシードデータや強力な外部教師モデルに依存している。命令のバックトランスレーション技術は、この依存性を減少させるが、それらは基本的に、完全な自動化を制限し、バイアスを導入し、ラベルなしコーパスの非効率な使用につながる最初のシードセットに結び付けられている。本稿では,完全に種なしの命令チューニングを実現する新しいフレームワークであるCycle-Instructを提案する。サイクル一貫性にインスパイアされたCycle-Instructは、2つのモデル-答え生成器と質問生成器-は、生のラベルなしテキストのみからブートストラップされる二重自己学習ループを採用している。これらのモデルは、生成した擬似ラベルから原文セグメントを再構築することで相互に監視し、人間が提供した種を使わずにデータの本質的な構造から効果的に学習する。一般的な命令フォロー、ドメイン固有のタスク、対話ログ、プレーンテキストを含む4つのデータトラックでCycle-Instructの有効性を実証する。大規模な実験により、Cycle-Instructは、シード駆動のバックトランスレーションベースラインを上回るだけでなく、強力な教師付き手法に匹敵する性能を達成できることが示された。

論文の概要: CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency

関連論文リスト