Fugu-MT 論文翻訳(概要): Forward-Backward Reasoning in Large Language Models for Verification

論文の概要: Forward-Backward Reasoning in Large Language Models for Verification

arxiv url: http://arxiv.org/abs/2308.07758v2
Date: Thu, 17 Aug 2023 05:55:44 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-21 19:15:11.403743
Title: Forward-Backward Reasoning in Large Language Models for Verification
Title（参考訳）: 検証のための大規模言語モデルの前方推論
Authors: Weisen Jiang and Han Shi and Longhui Yu and Zhengying Liu and Yu Zhang and Zhenguo Li and James T. Kwok
Abstract要約: Self-Consistency citepwang2023selfConsistencyは、様々な推論チェーンをサンプリングすることを提案する。本稿では,候補解の検証に後方推論を用いる新しい手法を提案する。
参考スコア（独自算出の注目度）: 69.25666654865826
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., ``\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}'' Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.
Abstract（参考訳）: Chain-of-Though (CoT)プロンプトは様々な推論タスクで有望なパフォーマンスを示している。近年、自己整合性(Self-Consistency) \citep{wang2023selfConsistency} は、最も多くの票を得た回答が選択される間に、異なる回答につながる可能性のある様々な推論チェーンをサンプリングすることを提案する。本稿では,候補回答の検証に後ろ向き推論を用いた新しい手法を提案する。質問中のトークンを${\bf x}$でマスクし、候補の回答が \textit{a simple template}、すなわち ``\textit{\textbf{if we know the answer of the question is \{a candidate answer\}, and the llm to predict the masked token when a candidate answer is provide by \textit{a simple template},すなわち ``\textit{\textbf{if we know the answer of the question is \{a candidate answer\}, what the value of unknown variable ${\bf x}$? 直感的には、LLMは与えられた候補回答が正しい場合、マスクされたトークンをうまく予測する。さらに, 候補回答の確率を推定するために, 前方と後方の推論を組み合わせるフォバーを提案する。 6つのデータセットと3つのLSMについて広範な実験を行う。実験結果から,FOBARは様々な推論ベンチマークで最先端の性能を達成することが示された。

関連論文リスト

Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
RLLM(Reasoning Large Language Models)は、複雑なタスクにおいて素晴らしいパフォーマンスを示す。彼らはしばしば過度に考え、正しい答えに達した後も不必要な推論ステップを実行します。本稿では,自己疑念の観点から,過剰思考を定量的に分析する。本稿では,入力問題に対するモデルの過度信頼度を低減するための,シンプルで効果的なプロンプト手法を提案する。
論文参考訳（メタデータ） (2025-05-29T14:30:02Z)
CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models [5.409370027524351]
本研究では, 大規模言語モデル(LLM)の性能評価を行った。我々は,新しいベンチマークデータセットであるCounterBenchを紹介した。
論文参考訳（メタデータ） (2025-02-16T06:19:37Z)
Reverse Thinking Makes LLMs Stronger Reasoners [90.42357659849215]
RevThinkは、データ拡張と学習目的からなるフレームワークである。 12のデータセットに対する実験では、学生モデルのゼロショットのパフォーマンスよりも平均13.53%改善されている。 RevThinkはまた、アウト・オブ・ディストリビューション・ホールドアウトデータセットへの強力な一般化を示している。
論文参考訳（メタデータ） (2024-11-29T17:27:05Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
最近の研究は、複数の推論チェーンをサンプリングし、応答周波数に基づいてアンサンブルすることで、Large Language Models(LLMs)の推論性能を向上させる。このアプローチは、正しい答えが少数派である場合に失敗する。階層的推論集約フレームワークAoRを導入し、推論連鎖の評価に基づいて回答を選択する。
論文参考訳（メタデータ） (2024-05-21T17:12:19Z)
Premise Order Matters in Reasoning with Large Language Models [57.18850969634412]
大規模言語モデル (LLM) は,前提の順序に驚くほど脆弱であることを示す。前提順序が中間的推論ステップで要求されるコンテキストと一致した場合, LLM が最高の性能を達成することを観察する。
論文参考訳（メタデータ） (2024-02-14T04:50:18Z)
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks [17.329365493094542]
ゲーム・オブ・24(Game of 24)とグラフカラー化(Graph Coloring)とSTRIPSプランニング(STRIPS Planning)の3分野において,GPT-4の性能に関する実証的研究を行った。我々は,自己批判による顕著なパフォーマンス崩壊と,音外検証による顕著なパフォーマンス向上を観察した。
論文参考訳（メタデータ） (2024-02-12T23:11:01Z)
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning [23.34325378824462]
大規模言語モデル(LLM)は、その振る舞いの正しさと安全性を検証するのが困難である。一つのアプローチは、LLMが質問に答えるときにステップバイステップの推論を生成することによって、彼らの推論を外部化するように促すことである。このアプローチは、モデルの実的推論を忠実に反映する記述された推論に依存しており、必ずしもそうではない。分解に基づく手法は、時にはCoTの手法に近づき、質問応答タスクにおいて高い性能を達成する。
論文参考訳（メタデータ） (2023-07-17T00:54:10Z)
Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
大規模言語モデル(LLM)は、いくつかの自然言語処理タスクにおいて強力な推論能力を示している。思考の連鎖(CoT)を促進させるLLMは、個別のミスに非常に敏感な、多段階のプロンプトと多段階の予測を必要とする。また,LLMにも同様な自己検証能力があることを示す。
論文参考訳（メタデータ） (2022-12-19T15:51:52Z)
Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
大規模言語モデルに対して,多段階推論を行うための課題について検討する。中心的な疑問は、どの推論例が最も効果的なプロンプトを作るかである。多段階推論のためのシンプルで効果的な例選択方式である複雑性ベースのプロンプトを提案する。
論文参考訳（メタデータ） (2022-10-03T05:33:27Z)
Faithful Reasoning Using Large Language Models [12.132449274592668]
因果構造が問題の根底にある論理構造を反映するプロセスを通じて、LMを忠実な多段階推論を行う方法を示す。我々の手法は、各ステップが2つの微調整されたLMへの呼び出しから得られる推論ステップをチェーンすることで機能する。我々は,多段階論理推論と科学的質問応答におけるモデルの有効性を実証し,最終的な解答精度のベースラインよりも優れていることを示す。
論文参考訳（メタデータ） (2022-08-30T13:44:41Z)
Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
マルチホップ質問応答のための生成コンテキスト選択モデルを提案する。提案した生成経路選択モデルは,対向保留集合上でのより良い性能(ベースラインより4.9%高い)を有する。
論文参考訳（メタデータ） (2021-04-18T07:00:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。