Fugu-MT 論文翻訳(概要): Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

論文の概要: Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

arxiv url: http://arxiv.org/abs/2509.02093v1
Date: Tue, 02 Sep 2025 08:45:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.965723
Title: Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization
Title（参考訳）: 比較による改善: 自動プロンプト最適化のための検索拡張コントラスト推論
Authors: Juhyeon Lee, Wonduk Seo, Hyunjin An, Seunghyun Lee, Yi Bu,
Abstract要約: 提案するCRPO(Contrastive Reasoning Prompt Optimization)は,検索拡張推論プロセスとして迅速な最適化を定式化する新しいフレームワークである。私たちのアプローチでは、ヘルプステア2データセットからトップk参照プロンプトを検索します。 CRPOは、高品位と低品位を明示的に対比することにより、特定のプロンプトが失敗する理由を推論し、より堅牢で解釈可能な最適化を実現する。
参考スコア（独自算出の注目度）: 6.3914079241545885
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Automatic prompt optimization has recently emerged as a strategy for improving the quality of prompts used in Large Language Models (LLMs), with the goal of generating more accurate and useful responses. However, most prior work focuses on direct prompt refinement or model fine-tuning, overlooking the potential of leveraging LLMs' inherent reasoning capability to learn from contrasting examples. In this paper, we present Contrastive Reasoning Prompt Optimization (CRPO), a novel framework that formulates prompt optimization as a retrieval augmented reasoning process. Our approach retrieves top k reference prompts from the HelpSteer2 dataset, an open-source collection annotated for helpfulness, correctness, coherence, complexity, and verbosity, and constructs two complementary optimization paradigms: (1) tiered contrastive reasoning, where the LLM compares high, medium, and low quality prompts to refine its own generation through reflective reasoning, and (2) multi-metric contrastive reasoning, where the LLM analyzes the best prompts along each evaluation dimension and integrates their strengths into an optimized prompt. By explicitly contrasting high and low quality exemplars, CRPO enables the model to deduce why certain prompts succeed while others fail, thereby achieving more robust and interpretable optimization. Experimental results on the HelpSteer2 benchmark demonstrate that CRPO significantly outperforms baselines. Our findings highlight the promise of contrastive, retrieval-augmented reasoning for advancing automatic prompt optimization.
Abstract（参考訳）: 自動プロンプト最適化は、最近、より正確で有用な応答を生成することを目的として、Large Language Models (LLMs)で使われるプロンプトの品質を改善する戦略として登場した。しかし、ほとんどの先行研究は直接的即興改良やモデル微調整に重点を置いており、LLMの本質的な推論能力を活用して、対照的な例から学ぶ可能性を見越している。本稿では,提案するCRPO(Contrastive Reasoning Prompt Optimization)について述べる。提案手法では,1)LLMが高,中,低品質のプロンプトを反射的推論で比較し,(2)LLMが各評価次元に沿って最適なプロンプトを解析し,その強度を最適化プロンプトに統合する,2つの相補的最適化パラダイムを構築している。 CRPOは、高品位と低品位を明示的に対比することにより、特定のプロンプトが失敗する理由を推論し、より堅牢で解釈可能な最適化を実現する。 HelpSteer2ベンチマークの実験結果は、CRPOがベースラインを大幅に上回っていることを示している。本研究は,自動プロンプト最適化を推し進める上で,コントラスト的・検索強化推論の可能性を浮き彫りにするものである。

論文の概要: Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

関連論文リスト