Fugu-MT 論文翻訳(概要): Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting

論文の概要: Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting

arxiv url: http://arxiv.org/abs/2305.13733v2
Date: Thu, 7 Mar 2024 03:11:47 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-08 18:14:52.592378
Title: Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
Title（参考訳）: Dual-critique Promptingによるインダクティブインストラクションに対する大規模言語モデルの強化
Authors: Rui Wang, Hongru Wang, Fei Mi, Yi Chen, Boyang Xue, Kam-Fai Wong, Ruifeng Xu
Abstract要約: 本稿では,大規模言語モデル(LLM)のテクスト誘導的指示に対する行動を明らかにするとともに,その真しさと有用性を高める。広範囲な人的・自動的な評価の結果,帰納的命令処理において LLM に共通する脆弱性が発見された。異なる帰納的スタイルがモデルに同じエラーを識別する能力に影響を及ぼし、基礎となる仮定の複雑さがモデルの性能にも影響を及ぼす。
参考スコア（独自算出の注目度）: 55.15697111170836
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Numerous works are proposed to align large language models (LLMs) with human intents to better fulfill instructions, ensuring they are trustful and helpful. Nevertheless, some human instructions are often malicious or misleading and following them will lead to untruthful and unsafe responses. Previous work rarely focused on understanding how LLMs manage instructions based on counterfactual premises, referred to here as \textit{inductive instructions}, which may stem from users' false beliefs or malicious intents. In this paper, we aim to reveal the behaviors of LLMs towards \textit{inductive instructions} and enhance their truthfulness and helpfulness accordingly. Specifically, we first introduce a benchmark of \underline{\textbf{Indu}}ctive {In\underline{\textbf{st}}ruct}ions (\textsc{\textbf{INDust}}), where the false knowledge is incorporated into instructions in multiple different styles. After extensive human and automatic evaluations, we uncovered a universal vulnerability among LLMs in processing inductive instructions. Additionally, we identified that different inductive styles affect the models' ability to identify the same underlying errors, and the complexity of the underlying assumptions also influences the model's performance. Motivated by these results, we propose \textsc{Dual-critique} prompting to improve LLM robustness against inductive instructions. Our experiments demonstrate that \textsc{Dual-critique} prompting significantly bolsters the robustness of a diverse array of LLMs, even when confronted with varying degrees of inductive instruction complexity and differing inductive styles.
Abstract（参考訳）: 大規模言語モデル(LLM)を人間の意図と整合させ、命令をよりよく満たし、信頼性と役に立つことを保証するために、数多くの研究が提案されている。それにもかかわらず、一部の人間の指示はしばしば悪意または誤解を招くものであり、それに従うと、真正で安全でない応答に繋がる。以前の研究では、llmが偽の前提に基づいて命令を管理する方法を理解することにほとんど焦点が当てられておらず、ここでは \textit{inductive instructions} と呼ばれる。本稿では, llm の振る舞いを \textit{inductive instructions} に対して明らかにし, その真理と有用性を高めることを目的とする。具体的には、まず、複数の異なるスタイルの命令に偽の知識を組み込む、 \underline{\textbf{Indu}}ctive {In\underline{\textbf{st}}ruct}ions (\textsc{\textbf{INDust}})のベンチマークを導入する。人的および自動的な評価を行った結果,インダクティブ命令処理におけるllmの普遍的脆弱性が確認された。さらに、異なる帰納的スタイルがモデルに同じエラーを識別する能力に影響を及ぼし、基礎となる仮定の複雑さがモデルの性能にも影響を及ぼすことも確認した。これらの結果から, LLMのインダクティブ命令に対する堅牢性の向上を促すために, textsc{Dual-critique}を提案する。我々の実験では、様々なインダクティブ命令の複雑さと異なるインダクティブスタイルに直面した場合でも、多種多様なllmのロバスト性が促進されることが示されている。

関連論文リスト

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction [68.6543680065379]
大型言語モデル(LLM)はインジェクション攻撃に弱い。本研究では,LLMの命令追従能力を抑えるのではなく,新たな防御手法を提案する。
論文参考訳（メタデータ） (2025-04-29T07:13:53Z)
LLMs can be easily Confused by Instructional Distractions [16.060402139507644]
大規模言語モデルは、タスクに続く命令において例外的なスキルを示す。この強度は、モデルが特定の命令を無視しなければならない場合に脆弱性になる可能性がある。 DIM-Benchと呼ばれる新しいベンチマークを導入する。
論文参考訳（メタデータ） (2025-02-05T04:52:57Z)
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models [99.31449616860291]
現代の言語モデル(LM)は、異なる方法で新しいタスクを実行することを学べる。次の命令では、ターゲットタスクは自然言語で明示的に記述され、少数ショットプロンプトでは、タスクは暗黙的に指定される。命令推論では、LMはインコンテキストの例を示し、自然言語のタスク記述を生成するように促される。
論文参考訳（メタデータ） (2024-04-03T19:31:56Z)
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions [43.19966425619236]
より構造的であいまいなコードスタイルの命令を使用して、典型的には自然言語命令を置き換える。そこで本研究では,クリーンサンプルと逆サンプルの両方を用いて,コンテキスト内デモを構成する新しい手法を提案する。 8つのロバスト性データセットの実験により、我々の手法は自然言語命令によるLLMよりも一貫して優れていた。
論文参考訳（メタデータ） (2024-02-26T09:30:55Z)
Contrastive Instruction Tuning [61.97704869248903]
意味論的に等価な命令-インスタンスペア間の類似性を最大化するために、コントラスト命令チューニングを提案する。 PromptBenchベンチマークの実験によると、CoINはLLMの頑健さを一貫して改善し、文字、単語、文、意味のレベルを平均して2.5%の精度で変化させる。
論文参考訳（メタデータ） (2024-02-17T00:09:32Z)
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models [91.02730155418699]
大規模言語モデル(LLM)は、自然言語命令に従うことで幅広いタスクを実行できる。 LLMに提供される命令の質を自動改善する新しい手法であるAuto-Instructを導入する。 118のアウトオブドメインタスクの実験では、Auto-Instructは人間による命令と既存のLCM生成命令のベースラインを超越している。
論文参考訳（メタデータ） (2023-10-19T19:52:55Z)
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
そこで本研究では,本質的な変化に着目した事前学習モデルの調整方法について検討する。次に、事前訓練されたモデルと命令調整されたモデルから導かれた説明を比較することで、命令チューニングの影響について研究する。この結果から,指導指導の3つの重要な影響が明らかになった。
論文参考訳（メタデータ） (2023-09-30T21:16:05Z)
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection [70.28425745910711]
LLM(Large Language Models)は、命令追従に非常に熟練した言語である。この能力は、迅速なインジェクション攻撃のリスクをもたらす。このような攻撃に対する命令追従LDMの堅牢性を評価する。
論文参考訳（メタデータ） (2023-08-17T06:21:50Z)
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models [28.37026309925163]
大きな言語モデル(LLM)は人間の値と一致し、安全なテキストを生成するように設計されている。以前のJailbreaking LLMのベンチマークでは、主にモデルの安全性の評価に焦点が当てられていた。本稿では,LLMの安全性とロバスト性を両立させ,バランスの取れたアプローチの必要性を強調した。
論文参考訳（メタデータ） (2023-07-17T13:49:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。