Fugu-MT 論文翻訳(概要): TADIS: Steering Models for Deep-Thinking about Demonstration Examples

論文の概要: TADIS: Steering Models for Deep-Thinking about Demonstration Examples

arxiv url: http://arxiv.org/abs/2310.00901v1
Date: Mon, 2 Oct 2023 04:42:53 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-04 23:22:27.513387
Title: TADIS: Steering Models for Deep-Thinking about Demonstration Examples
Title（参考訳）: TADIS: デモ事例のディープシンキングのためのステアリングモデル
Authors: Tianci Xue, Ziqi Wang, Yixia Li, Yun Chen, Guanhua Chen
Abstract要約: 大きな言語モデル(LLM)は、以前よりもはるかに高いパフォーマンスを達成する。最近の研究によると、妄想的なタスクの例は正しいタスクの例とほとんど同じパフォーマンスを達成できる。実演例を単に見るのではなく, LLM を "ディープシンキング (deep-Thinking)" として活用する TADIS と呼ばれる新しい手法を提案する。
参考スコア（独自算出の注目度）: 7.240651102553018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction tuning has been demonstrated that could significantly improve the zero-shot generalization capability to unseen tasks by an apparent margin. By incorporating additional context (e.g., task definition, examples) during the fine-tuning process, Large Language Models (LLMs) achieved much higher performance than before. However, recent work reported that delusive task examples can achieve almost the same performance as correct task examples, indicating the input-label correspondence is less important than previously thought. Intrigued by this counter-intuitive observation, we suspect models have the same illusion of competence as humans. Therefore, we propose a novel method called TADIS that steers LLMs for "Deep-Thinking'' about demonstration examples instead of merely seeing. To alleviate the illusion of competence of models, we first ask the model to verify the correctness of shown examples. Then, using the verification results as conditions to elicit models for a better answer. Our experimental results show that TADIS consistently outperforms competitive baselines on in-domain and out-domain tasks (improving 2.79 and 4.03 average ROUGLE-L on out-domain and in-domain datasets, respectively). Despite the presence of generated examples (not all of the thinking labels are accurate), TADIS can notably enhance performance in zero-shot and few-shot settings. This also suggests that our approach can be adopted on a large scale to improve the instruction following capabilities of models without any manual labor. Moreover, we construct three types of thinking labels with different model sizes and find that small models learn from the format of TADIS but larger models can be steered for "Deep-Thinking''.
Abstract（参考訳）: 命令のチューニングは、目に見えないタスクに対してゼロショットの一般化能力を大幅に改善できることが実証されている。微調整プロセス中に追加のコンテキスト(タスク定義、例など)を組み込むことで、LLM(Large Language Models)は以前よりもはるかに高いパフォーマンスを実現した。しかし、近年の研究では、妄想的なタスク例は正しいタスク例とほぼ同等のパフォーマンスを達成できると報告されている。この直観に反する観察から興味をそそられるのは、モデルが人間と同じ能力の錯覚を持っていることだ。 Therefore, we propose a novel method called TADIS that steers LLMs for "Deep-Thinking'' about demonstration examples instead of merely seeing. To alleviate the illusion of competence of models, we first ask the model to verify the correctness of shown examples. Then, using the verification results as conditions to elicit models for a better answer. Our experimental results show that TADIS consistently outperforms competitive baselines on in-domain and out-domain tasks (improving 2.79 and 4.03 average ROUGLE-L on out-domain and in-domain datasets, respectively). Despite the presence of generated examples (not all of the thinking labels are accurate), TADIS can notably enhance performance in zero-shot and few-shot settings. This also suggests that our approach can be adopted on a large scale to improve the instruction following capabilities of models without any manual labor. Moreover, we construct three types of thinking labels with different model sizes and find that small models learn from the format of TADIS but larger models can be steered for "Deep-Thinking''.

関連論文リスト

Context Tuning for In-Context Optimization [11.728105991946773]
コンテキストチューニングは、微調整モデルパラメータを使わずに、言語モデル(LLM)の少数ショット適応を強化する、シンプルで効果的な方法である。プロンプトベースの適応とは対照的に、Context Tuningはトレーニング可能なプロンプトやプレフィックスをタスク固有のデモ例で初期化する。 CrossFit、UnifiedQA、MMLU、BIG-Bench Hard、ARCといったベンチマークの大規模な評価は、Context Tuningが従来のプロンプトベースの適応手法よりも優れていることを示している。
論文参考訳（メタデータ） (2025-07-06T03:23:53Z)
Instruction Tuning with Retrieval-based Examples Ranking for Aspect-based Sentiment Analysis [7.458853474864602]
アスペクトベースの感情分析(ABSA)は、特定の側面に関連する感情情報を識別し、企業や組織に対してより深い市場洞察を提供する。近年の研究では、ABSAを生成タスクとして再構成する命令チューニングの固定例が提案されている。本研究では,ABSAタスクの検索に基づくサンプルランキングを用いた指導学習手法を提案する。
論文参考訳（メタデータ） (2024-05-28T10:39:10Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
大規模言語モデル(LLM)は、現実世界のアプリケーションで印象的な機能を示している。これらの卓越した作品の品質は、パフォーマンスに大きな影響を与えます。既存の方法は、先行注文がパフォーマンスに与える影響を適切に説明できない。
論文参考訳（メタデータ） (2024-05-25T08:23:05Z)
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
インコンテキスト学習(ICL)により、トランスフォーマーベースの言語モデルでは、パラメータを更新することなく、いくつかの"タスクデモ"で特定のタスクを学習することができる。 ICLの特徴に対処する影響関数に基づく帰属手法DETAILを提案する。ホワイトボックスモデルで得られた属性スコアがブラックボックスモデルに転送可能であることを示すことにより、モデル性能を向上させる上で、DETAILの広範な適用性を実験的に証明する。
論文参考訳（メタデータ） (2024-05-22T15:52:52Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggetsはワンショット学習を使用して、広範なデータセットから高品質な命令データを選択する。我々は,textscNuggets がキュレートした例の上位1%による命令チューニングが,データセット全体を用いた従来の手法よりも大幅に優れていることを示す。
論文参考訳（メタデータ） (2023-12-16T03:33:12Z)
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
そこで本研究では,本質的な変化に着目した事前学習モデルの調整方法について検討する。次に、事前訓練されたモデルと命令調整されたモデルから導かれた説明を比較することで、命令チューニングの影響について研究する。この結果から,指導指導の3つの重要な影響が明らかになった。
論文参考訳（メタデータ） (2023-09-30T21:16:05Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
大規模言語モデル(LLM)は、翻訳や要約といった条件付きシーケンス生成タスクを実行することができる。入力文の後にタスク命令の位置をシフトさせることにより,LLMの指示追従能力を向上させることを提案する。
論文参考訳（メタデータ） (2023-08-23T12:36:57Z)
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models [23.488398944358643]
新規な(観測されていない)が適切な命令表現を用いることで、モデル性能は一貫して低下することがわかった。本稿では,ソフトプロンプトの埋め込みパラメータを導入することで,この問題を軽減するための簡単な手法を提案する。本手法は命令調整モデルのロバスト性を常に改善することを示す。
論文参考訳（メタデータ） (2023-06-20T03:48:51Z)
In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models [5.5089506884366735]
本稿では, In-Context Probing (ICP) という代替手法を提案する。インコンテキスト学習と同様に、入力の表現を命令で文脈化するが、出力予測を復号する代わりに、ラベルを予測するために文脈化表現を探索する。我々はICPがファインタニングよりも優れていることを示し、より小さなモデルの上に分類器を構築するのに特に有用であることを示した。
論文参考訳（メタデータ） (2023-05-23T15:43:04Z)
Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
インコンテキスト学習は、トレーニング例、例えば順、プロンプトフォーマットのバリエーションによって、高い不安定性に悩まされる可能性がある。ラベルや属性に対する固定的なプロンプトの予測バイアスを評価するための指標を導入する。そこで本研究では,テキスト内学習の性能向上のための最寄りのプロンプトを特定するための,欲求探索に基づく新しい探索手法を提案する。
論文参考訳（メタデータ） (2023-03-23T12:28:25Z)
How Does In-Context Learning Help Prompt Tuning? [55.78535874154915]
微調整された大きな言語モデルは、急速に拡大するスケールのために、ますます実用的ではないものになりつつある。これはプロンプトチューニング(PT)のようなパラメータ効率のよい適応手法の使用を動機付け、凍ったモデルに少数のチューナブルな埋め込みを追加する。近年,Singhalら (2022) はPTとICLを組み合わせた命令プロンプトチューニング (IPT) を提案している。
論文参考訳（メタデータ） (2023-02-22T17:45:12Z)
Learning Action Conditions from Instructional Manuals for Instruction Understanding [48.52663250368341]
本稿では,行動条件推論というタスクを提案し,命令マニュアルにおける行動条件の事前条件と後条件の高品質なアノテートデータセットを収集する。本稿では,オンライン指導マニュアルから大規模トレーニングインスタンスを自動構築する弱い教師付きアプローチを提案し,人間に注釈を付けて検証したデータセットをキュレートし,現在のNLPモデルが命令テキストの動作条件依存性をいかに推測できるかを検証した。
論文参考訳（メタデータ） (2022-05-25T00:19:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。