Fugu-MT 論文翻訳(概要): Measuring and Narrowing the Compositionality Gap in Language Models

論文の概要: Measuring and Narrowing the Compositionality Gap in Language Models

arxiv url: http://arxiv.org/abs/2210.03350v3
Date: Tue, 17 Oct 2023 18:57:17 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-19 21:13:08.443549
Title: Measuring and Narrowing the Compositionality Gap in Language Models
Title（参考訳）: 言語モデルにおける構成性ギャップの測定と狭化
Authors: Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis
Abstract要約: モデルがすべてのサブプロブレムに正しく答えられる頻度を計測するが、全体の解は生成しない。我々は,思考の連鎖をさらに改善する新たな手法である自己認識法を提案する。
参考スコア（独自算出の注目度）: 116.5228850227024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of compositional reasoning. We then demonstrate how elicitive prompting (such as chain of thought) narrows the compositionality gap by reasoning explicitly. We present a new method, self-ask, that further improves on chain of thought. In our method, the model explicitly asks itself (and answers) follow-up questions before answering the initial question. We finally show that self-ask's structured prompting lets us easily plug in a search engine to answer the follow-up questions, which additionally improves accuracy.
Abstract（参考訳）: 本稿では,サブプロブレムに対する解の正しい構成に依存した構成的推論タスクを言語モデルで行う能力について検討する。モデルがすべてのサブ問題に正しく答えられる頻度を計測し、全体の解を生成しない。我々は,事前学習中に複数の事実が一緒に観測されそうにない回答を複数問うことで,この比率を評価する。 GPT-3 モデルでは,モデルサイズの増加に伴い,シングルホップ質問応答性能はマルチホップ性能よりも高速に向上し,構成性差は減少しない。この驚くべき結果は、より強力なモデルが実際の知識を記憶し記憶する一方で、そのような構成的推論を行う能力に対応する改善は見られないことを示唆している。次に、帰納的プロンプト(思考の連鎖など)が、明示的に推論することで構成的ギャップを狭めるかを示す。我々は,思考連鎖をさらに改善する新しい手法であるself-askを提案する。提案手法では,最初の質問に答える前に,モデルがフォローアップ質問を明示的に問う(と回答)。最後に、self-askの構造化プロンプトによって、検索エンジンをプラグインしてフォローアップ質問に答えることができます。

関連論文リスト

Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step [81.50681925980135]
モデル推論における心の変化を探索する手法を提案する。心的変化のパターンを解析することにより,モデルの推論の正しさを検証した。我々の検証では、最終回答では正しいが、多くの応答が推論プロセスに誤りを含んでいることが明らかになった。
論文参考訳（メタデータ） (2024-06-23T15:50:22Z)
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning [23.34325378824462]
大規模言語モデル(LLM)は、その振る舞いの正しさと安全性を検証するのが困難である。一つのアプローチは、LLMが質問に答えるときにステップバイステップの推論を生成することによって、彼らの推論を外部化するように促すことである。このアプローチは、モデルの実的推論を忠実に反映する記述された推論に依存しており、必ずしもそうではない。分解に基づく手法は、時にはCoTの手法に近づき、質問応答タスクにおいて高い性能を達成する。
論文参考訳（メタデータ） (2023-07-17T00:54:10Z)
RECKONING: Reasoning through Dynamic Knowledge Encoding [51.076603338764706]
言語モデルは、文脈の一部として提供される知識について推論することで、質問に答えることができることを示す。これらの状況では、モデルは質問に答えるために必要な知識を区別することができない。我々は、与えられた文脈知識をモデルのパラメータに折り畳み、より堅牢に推論するようにモデルに教えることを提案する。
論文参考訳（メタデータ） (2023-05-10T17:54:51Z)
Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering [85.79940770146557]
マルチホップ質問を複数の単一ホップ質問に分解する。これらの対の見かけ上同一の問合せ連鎖について、QAモデルの答えに顕著な矛盾が認められる。シングルホップの質問だけを訓練すると、モデルはマルチホップの質問に対してあまり一般化しない。
論文参考訳（メタデータ） (2022-10-09T11:48:07Z)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering [124.16250115608604]
本稿では,SQA(Science Question Answering)について紹介する。SQA(Science Question Answering)は,21万のマルチモーダルな複数選択質問と多様な科学トピックと,それに対応する講義や説明による回答の注釈からなる新しいベンチマークである。また,SQAでは,数ショットのGPT-3では1.20%,微調整のUnifiedQAでは3.99%の改善が見られた。我々の分析は、人間に似た言語モデルは、より少ないデータから学習し、わずか40%のデータで同じパフォーマンスを達成するのに、説明の恩恵を受けることを示している。
論文参考訳（メタデータ） (2022-09-20T07:04:24Z)
Robustifying Multi-hop QA through Pseudo-Evidentiality Training [28.584236042324896]
本研究では,正解法を使わずに正しい解答を行うマルチホップ質問応答モデルのバイアス問題について検討する。そこで本稿では, 正解が正しい証拠によって裏付けられるかどうかを判断し, 明解性を学習するための新しい手法を提案する。
論文参考訳（メタデータ） (2021-07-07T14:15:14Z)
Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
マルチホップ質問応答のための生成コンテキスト選択モデルを提案する。提案した生成経路選択モデルは,対向保留集合上でのより良い性能(ベースラインより4.9%高い)を有する。
論文参考訳（メタデータ） (2021-04-18T07:00:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。