Fugu-MT 論文翻訳(概要): Large Language Models are Zero-Shot Reasoners

論文の概要: Large Language Models are Zero-Shot Reasoners

arxiv url: http://arxiv.org/abs/2205.11916v1
Date: Tue, 24 May 2022 09:22:26 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-25 12:37:46.661510
Title: Large Language Models are Zero-Shot Reasoners
Title（参考訳）: 大きな言語モデルはゼロショット推論である
Authors: Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa
Abstract要約: 思考の連鎖(CoT)プロンプトは、ステップバイステップの回答例を通して複雑な多段階推論を引き出す手法である。 LLMは、各回答の前に単に「ステップバイステップ」を追加して、まともなゼロショット推論子であることを示す。実験結果から,同一のプロンプトテンプレートを用いたZero-shot-CoTはゼロショットLLM性能を著しく上回ることがわかった。
参考スコア（独自算出の注目度）: 28.6899375595088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
Abstract（参考訳）: 事前訓練された大規模言語モデル(LLM)は、自然言語処理(NLP)の多くのサブフィールドで広く使われている。特に、複雑な多段階推論をステップバイステップで導く手法である思考連鎖(CoT)プロンプトは、算術と記号的推論における最先端のパフォーマンスを達成し、LLMの標準スケーリング法則に従わない難しいシステム-2タスクを誘導する。これらの成功は、数発の学習でLLMの能力に起因することが多いが、LLMは、各回答の前に'Let's Think by Step''を単に追加することで、まともなゼロショット推論であることを示す。 Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. 非常に多様な推論タスクにまたがるこの単一のプロンプトの汎用性は、llmの基本的なゼロショット機能を暗示し、単純なプロンプトによって高レベルでマルチタスクの幅広い認知能力を抽出できることを示唆している。我々の研究は、挑戦的な推論ベンチマークの最小限のゼロショットベースラインとして機能するだけでなく、微調整されたデータセットや数ショットの先例を作る前に、LSM内に隠された巨大なゼロショット知識を慎重に調査し分析することの重要性も強調したい。

関連論文リスト

MIR-Bench: Benchmarking LLM's Long-Context Intelligence via Many-Shot In-Context Inductive Reasoning [21.056519816264505]
我々は,最初のマルチショットインコンテキスト帰納的推論ベンチマークであるMIR-Benchを提案する。帰納的推論と多発性ICLに関する多くの新しい問題について検討し, 誤写に対する頑健さについて検討した。
論文参考訳（メタデータ） (2025-02-14T06:05:12Z)
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems [28.72485319617863]
LLMは、人間が扱いやすいようないくつかの基本的なタスク、例えば単語トラウベリーの文字数rを数えるのに苦労する。我々は,高度な数学的およびコーディング推論能力の伝達可能性について,特殊なLCMから単純なカウントタスクまでの測定を行う。微調整や文脈内学習といった戦略と比較すると、係り受け推論はLLMのタスクをより知覚するのに役立つ最も堅牢で効率的な方法であることがわかる。
論文参考訳（メタデータ） (2024-10-18T04:17:16Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
大規模言語モデル(LLM)は、適切な自然言語プロンプトを提供する際に、多様なタスクを解決するという約束を持っている。学生LLMからタスク固有の入出力ペアを合成する多段階メカニズムであるSELF-GUIDEを提案する。ベンチマークの指標から,分類タスクに約15%,生成タスクに18%の絶対的な改善を報告した。
論文参考訳（メタデータ） (2024-07-16T04:41:58Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
大規模言語モデル(LLM)は多くの自然言語タスクにおいて印象的な能力を示している。 LLMは多段階推論を行う際にエラー、幻覚、矛盾する文を生成する傾向がある。本稿では,LLMの復号化過程を検討計画で導くためのフレームワークであるQ*を紹介する。
論文参考訳（メタデータ） (2024-06-20T13:08:09Z)
Zero-Shot Question Answering over Financial Documents using Large Language Models [0.18749305679160366]
我々は,財務報告に対するマルチホップ数値推論を必要とする複雑な問題に答えるために,大規模言語モデル(LLM)に基づくアプローチを導入する。 LLMを誘導する新しいゼロショットプロンプトを使用して、必要な推論をPythonプログラムやドメイン固有言語にエンコードします。
論文参考訳（メタデータ） (2023-11-19T16:23:34Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
調整された大規模言語モデル(LLM)は、タスク解決、指示に従うこと、安全性を確保することにおいて、例外的な能力を示す。既存の連続学習ベンチマークでは、LLMをリードする上で十分な課題が欠如している。 LLMにおける継続学習を評価するための新しいベンチマークであるTRACEを紹介する。
論文参考訳（メタデータ） (2023-10-10T16:38:49Z)
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models [122.19845578690466]
Step-Back Promptingは、LLMが抽象化を行い、特定の詳細を含むインスタンスから高レベルの概念と第一原則を導出することを可能にする。推論を導くために概念と原則を用いることで、LLMはソリューションへの正しい推論パスに従うことで、その能力を大幅に向上します。
論文参考訳（メタデータ） (2023-10-09T19:48:55Z)
Better Zero-Shot Reasoning with Self-Adaptive Prompting [39.54061907239995]
現代の大規模言語モデル(LLM)は、しばしば人間に似たステップ・バイ・ステップの推論を通じて、洗練されたタスクにおいて印象的な能力を示してきた。本稿では,LCMの新しいプロンプト設計手法である一貫性に基づく自己適応型プロンプト(COSP)を提案する。 COSPは、ゼロショットベースラインに比べて最大15%の性能向上を示し、様々な推論タスクにおいて、数ショットベースラインを超えている。
論文参考訳（メタデータ） (2023-05-23T14:27:16Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
本研究では,大規模言語モデル (LLM) の推論能力を向上させるために,新しい満足度支援言語モデリング (SatLM) 手法を提案する。我々はLLMを用いて命令型プログラムではなく宣言型タスク仕様を生成し、既製の自動定理証明器を利用して最終解を導出する。我々はSATLMを8つの異なるデータセット上で評価し、命令パラダイムにおいてプログラム支援されたLMよりも一貫して優れていることを示す。
論文参考訳（メタデータ） (2023-05-16T17:55:51Z)
PAL: Program-aided Language Models [112.94785609781503]
自然言語問題を理解するために,プログラム支援言語モデル(PaL)を提案する。 PaLはソリューションステップをPythonインタプリタのようなプログラムランタイムにオフロードする。私たちは12のベンチマークで新しい最先端の結果を設定しました。
論文参考訳（メタデータ） (2022-11-18T18:56:13Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。