Fugu-MT 論文翻訳(概要): BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

論文の概要: BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

arxiv url: http://arxiv.org/abs/2605.25549v1
Date: Mon, 25 May 2026 08:06:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.457041
Title: BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data
Title（参考訳）: BCプロトコル: 学習後の高品質連鎖を除去するための構造化デュアルエキスパート対話
Authors: Bo Zou, Chao Xu,
Abstract要約: 高品質な専門家チェーン・オブ・シークレット(CoT)データは、大規模言語モデル(LLM)のポストトレーニングにおける中核的なボトルネックの1つです。本稿では,LLM後処理データ生成のための構造付きデュアルエキスパート推論手法であるBCプロトコルを提案する。
参考スコア（独自算出の注目度）: 10.071691304378065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (δ=1.0)).
Abstract（参考訳）: 高品質な専門家チェーン・オブ・シークレット(CoT)データは、大規模言語モデル(LLM)のポストトレーニングにおける中核的なボトルネックの1つです。既存のデータ生成手法には構造的制約がある: クラウドソースアノテーションは深い推論経路を欠いている; 専門家の独著は「専門家の盲点」によって制約されている -- 専門家は当然と考える推論ステップを無視している; RLHFは推論チェーンよりも優先信号のみを生成する。本論文では、LLM後処理データ生成のための構造的デュアルエキスパート推論手法であるBCプロトコルを提案する。この方法は、ドメインエキスパート(結晶化インテリジェンス)と知識エンジニア(流動化インテリジェンス)を慎重に組み合わせ、専門家の暗黙の判断を自然言語推論チェーンとして体系的に外部化する。そこで本研究では,誘引品質に影響を及ぼす6つの主観的特徴次元を規定する参加者適性モデルを提案する。キャリブレーション・イグノランス(Calibrated Ignorance)は、本論文で提案された原案である。暗黙の知識導入タスクにおいて,人事選択における品質管理資源の投資は,プロセス設計における同一資源の投資よりも高いリターンをもたらす。物語フィクション領域における制御実験では,BCプロトコルの二重対話(グループA, (n=20))によって生成されたCoTと,同じドメインの専門家(グループB, (n=20))が独立に書いたCoTとを直接比較した。 GPT-4o、Claude Opus 4.5、Gemini 2.5 Proの3つのクロスベンダー審査モデルが5次元(合計600評価)でブラインド評価を行った。 BCプロトコルは「推論過程の自然性」(グループAは4.80、グループBは1.30、(p=2.4\times10^{-8})、Cliff's (δ=1.0))において圧倒的な優位性を達成している。

論文の概要: BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

関連論文リスト