Fugu-MT 論文翻訳(概要): The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

論文の概要: The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

arxiv url: http://arxiv.org/abs/2312.01552v1
Date: Mon, 4 Dec 2023 00:46:11 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-05 16:46:03.612329
Title: The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Title（参考訳）: base llmsのアンロックスペル: インコンテキスト学習によるアライメント再考
Authors: Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi
Abstract要約: 最近の研究であるLIMAは、アライメントチューニングに1Kの例のみを用いることで、アライメント性能も著しく向上することを示した。これにより、アライメントチューニングがベースLLMをどのように変換するかという疑問が提起される。本研究では,チューニングフリーとチューニングベースアライメントのギャップを戦略的プロンプトによって著しく低減できることを示す。
参考スコア（独自算出の注目度）: 61.68787689234622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The alignment tuning process of large language models (LLMs) typically involves instruction learning through supervised fine-tuning (SFT) and preference tuning via reinforcement learning from human feedback (RLHF). A recent study, LIMA (Zhou et al. 2023), shows that using merely 1K examples for SFT can achieve significant alignment performance as well, suggesting that the effect of alignment tuning might be "superficial." This raises questions about how exactly the alignment tuning transforms a base LLM. We analyze the effect of alignment tuning by examining the token distribution shift between base LLMs and their aligned counterpart. Our findings reveal that base LLMs and their alignment-tuned versions perform nearly identically in decoding on the majority of token positions. Most distribution shifts occur with stylistic tokens. These direct evidence strongly supports the Superficial Alignment Hypothesis suggested by LIMA. Based on these findings, we rethink the alignment of LLMs by posing the research question: how effectively can we align base LLMs without SFT or RLHF? To address this, we introduce a simple, tuning-free alignment method, URIAL. URIAL achieves effective alignment purely through in-context learning (ICL) with base LLMs, requiring as few as three constant stylistic examples and a system prompt. We conduct a fine-grained and interpretable evaluation on a diverse set of examples, named JUST-EVAL-INSTRUCT. Results demonstrate that base LLMs with URIAL can match or even surpass the performance of LLMs aligned with SFT or SFT+RLHF. We show that the gap between tuning-free and tuning-based alignment methods can be significantly reduced through strategic prompting and ICL. Our findings on the superficial nature of alignment tuning and results with URIAL suggest that deeper analysis and theoretical understanding of alignment is crucial to future LLM research.
Abstract（参考訳）: 大規模言語モデル(LLM)のアライメントチューニングプロセスは、典型的には、教師付き微調整(SFT)による指導学習と、人間からのフィードバック(RLHF)による強化学習による選好チューニングを含む。最近の研究であるLIMA (Zhou et al. 2023) は、単に1KのサンプルをSFTに使用すれば、アライメントのパフォーマンスも向上し、アライメントチューニングの効果が「超越的」である可能性を示唆している。これにより、アライメントチューニングがベースLLMをどのように変換するかという疑問が提起される。基本LLMとその配位子間のトークン分布シフトを調べることでアライメントチューニングの効果を解析する。本研究により, トークン位置の復号化において, 基本LLMとアライメント調整版がほぼ同じ性能を示した。ほとんどの分布シフトはスタイリスティックなトークンで起こる。これらの直接的な証拠はLIMAが提案した表面配向仮説を強く支持している。これらの知見に基づいて,SFT や RLHF を使わずに,LLM のアライメントをいかに効果的に調整できるかという研究課題を提起することによって,LLM のアライメントを再考する。そこで本研究では,シンプルなチューニング不要アライメント手法URIALを提案する。 URIALは、テキスト内学習(ICL)をベースLLMと組み合わせることで、効果的なアライメントを実現し、3つの定常的なスタイリスティックな例とシステムプロンプトを必要とする。我々は,JUST-EVAL-INSTRUCTという,多種多様な例に対して,きめ細かな,解釈可能な評価を行う。その結果, URIAL をベースとした LLM は, SFT や SFT+RLHF と整合した LLM の性能に適合したり, 上回ったりできることを示した。我々は,チューニングフリーとチューニングベースアライメントのギャップを戦略的プロンプトとICLによって著しく低減できることを示す。我々は,アライメント調整の表層的性質とURIALによる結果から,アライメントの深い解析と理論的理解が今後のLLM研究に不可欠であることが示唆された。

論文の概要: The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

関連論文リスト