Fugu-MT 論文翻訳(概要): Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem

論文の概要: Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem

arxiv url: http://arxiv.org/abs/2603.13351v1
Date: Sat, 07 Mar 2026 18:27:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.110631
Title: Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem
Title（参考訳）: Prompt Complexity Dilutes Structured Reasoning: 車の洗車問題に関するフォローアップ研究
Authors: Heejin Jo,
Abstract要約: STAR(Situation, Task, Action, Result)はクロード・ソネット4.5で洗車問題の精度を0%から85%に引き上げ、さらにプロンプト・レイヤを追加して100%に引き上げた。 STARは本番システムのプロンプトで有効性を維持していますか? 私たちはInterviewMateの60行以上の生産プロンプトの中でSTARをテストしました。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In a previous study [Jo, 2026], STAR reasoning (Situation, Task, Action, Result) raised car wash problem accuracy from 0% to 85% on Claude Sonnet 4.5, and to 100% with additional prompt layers. This follow-up asks: does STAR maintain its effectiveness in a production system prompt? We tested STAR inside InterviewMate's 60+ line production prompt, which had evolved through iterative additions of style guidelines, format instructions, and profile features. Three conditions, 20 trials each, on Claude Sonnet 4.6: (A) production prompt with Anthropic profile, (B) production prompt with default profile, (C) original STAR-only prompt. C scored 100% (verified at n=100). A and B scored 0% and 30%. Prompt complexity dilutes structured reasoning. STAR achieves 100% in isolation but degrades to 0-30% when surrounded by competing instructions. The mechanism: directives like "Lead with specifics" force conclusion-first output, reversing the reason-then-conclude order that makes STAR effective. In one case, the model output "Short answer: Walk." then executed STAR reasoning that correctly identified the constraint -- proving the model could reason correctly but had already committed to the wrong answer. Cross-model comparison shows STAR-only improved from 85% (Sonnet 4.5) to 100% (Sonnet 4.6) without prompt changes, suggesting model upgrades amplify structured reasoning in isolation. These results imply structured reasoning frameworks should not be assumed to transfer from isolated testing to complex prompt environments. The order in which a model reasons and concludes is a first-class design variable.
Abstract（参考訳）: 前回の[Jo, 2026]では、STAR推論(Situation, Task, Action, Result)により、クロードソネット4.5で洗車問題の精度が0%から85%、さらにプロンプト層で100%向上した。 STARは本番システムのプロンプトで有効性を維持していますか? 私たちはInterviewMateの60行以上の生産プロンプトの中でSTARをテストしました。クロード・ソネット4.6の3つの条件、(A)プロンプト、(B)デフォルトプロンプト、(C)オリジナルのSTARのみプロンプト。 Cは100%(n=100)であった。 A,Bは0%,Bは30%であった。確率複雑性は構造的推論を希薄にする。 STARは分離して100%を達成するが、競合する命令で囲まれると0-30%に低下する。このメカニズム:STARを効果的にするための「具体的学習」のような指示は、結論優先の出力を強制し、理を伴わない順序を逆転させる。あるケースでは、モデルが"Short answer: Walk"を出力し、次に、制約を正しく識別するSTAR推論を実行します。モデル間比較ではSTARのみの改善が85% (Sonnet 4.5) から100% (Sonnet 4.6) に急激な変更なしに行われた。これらの結果は、独立したテストから複雑なプロンプト環境へ移行する、構造化された推論フレームワークを仮定すべきでないことを示唆している。モデルが理由と結論を導く順序は、第一級の設計変数である。

論文の概要: Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem

関連論文リスト