Fugu-MT 論文翻訳(概要): Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

論文の概要: Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

arxiv url: http://arxiv.org/abs/2511.01016v1
Date: Sun, 02 Nov 2025 17:11:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.03036
Title: Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning
Title（参考訳）: Prompt-R1: エンドツーエンド強化学習による協調型自動プロンプトフレームワーク
Authors: Wenjin Liu, Haoran Luo, Xueyuan Lin, Haoming Liu, Tiesunlong Shen, Jiapu Wang, Rui Mao, Erik Cambria,
Abstract要約: 本稿では,小規模言語モデルを用いて大規模言語モデルと協調するエンドツーエンド強化学習フレームワークPrompt-R1を提案する。二重制約の報酬は、正確性、生成品質、推論精度を最適化するために設計されている。 Prompt-R1はタスク間でベースラインモデルよりも大幅に優れている。
参考スコア（独自算出の注目度）: 34.70213312250216
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.
Abstract（参考訳）: 近年,先進的な大規模言語モデル (LLM) が急速に普及している。しかし、複雑な問題に直面した場合、ほとんどのユーザはLLMと対話するための正確で効果的なプロンプトを提供できないため、LLMの性能は制限される。この課題に対処するため,我々は,小規模LLMを用いて大規模LLMと協調するエンドツーエンド強化学習フレームワークであるPrompt-R1を提案する。このコラボレーションはマルチターンプロンプトの相互作用としてキャストされ、小規模のLLMはプロンプトを考え、生成し、大規模のLLMは複雑な推論を行う。二重制約の報酬は、正確性、生成品質、推論精度を最適化するために設計されている。 Prompt-R1は様々な大規模LCMによる推論とトレーニングをサポートするプラグイン・アンド・プレイのフレームワークを提供する。複数の公開データセットの実験によると、Prompt-R1はタスク全体でベースラインモデルを大幅に上回っている。私たちのコードはhttps://github.com/QwenQKing/Prompt-R1.comで公開されています。

論文の概要: Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

関連論文リスト