Fugu-MT 論文翻訳(概要): It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

論文の概要: It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

arxiv url: http://arxiv.org/abs/2308.08268v1
Date: Wed, 16 Aug 2023 10:09:42 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-17 13:54:25.172229
Title: It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models
Title（参考訳）: 変圧器モデルのためのOOD一般化における謎のパフォーマンス低下の理解
Authors: Xingcheng Xu, Zihao Pan, Haipeng Zhang, Yanqing Yang
Abstract要約: 生成トランスフォーマーベースのモデルは、多様な問題を解決するための卓越した熟練性を実現している。しかし、それらの一般化能力は必ずしも完全には理解されておらず、必ずしも満足していない。
参考スコア（独自算出の注目度）: 6.626501860715937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative Transformer-based models have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not fully understood and not always satisfying. Researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. Curiously, it is observed that when training on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably and mysteriously on longer, unseen cases (out-of-distribution (OOD) generalization). Studies try to bridge this gap with workarounds such as modifying position embedding, fine-tuning, and priming with more extensive or instructive data. However, without addressing the essential mechanism, there is hardly any guarantee regarding the robustness of these solutions. We bring this unexplained performance drop into attention and ask whether it is purely from random errors. Here we turn to the mechanistic line of research which has notable successes in model interpretability. We discover that the strong ID generalization stems from structured representations, while behind the unsatisfying OOD performance, the models still exhibit clear learned algebraic structures. Specifically, these models map unseen OOD inputs to outputs with equivalence relations in the ID domain. These highlight the potential of the models to carry useful information for improved generalization.
Abstract（参考訳）: 生成変圧器に基づくモデルは、多様な問題を解決するための優れた技術を達成している。しかし、その一般化能力は完全には理解されておらず、必ずしも満足するとは限らない。研究者は、n桁加算や乗法といった基本的な数学的タスクを、一般化の振る舞いを研究する上で重要な視点として捉えている。皮肉なことに、両方の入力オペランドが n-digit である n-digit 操作(例えば、加算)のトレーニングでは、モデルが未知の n-digit 入力 (in-distriion (ID) generalization) でうまく一般化するが、より長く、神秘的に失敗する(out-of-distriion (OOD) generalization)。このギャップを,位置埋め込みや微調整,プライミングなどの回避策と,より広範囲な,あるいは指示的なデータで橋渡ししようとする研究もある。しかし、本質的なメカニズムに対処することなく、これらの解の堅牢性に関する保証はほとんどない。この説明不能なパフォーマンスの低下に注意を向け、それが純粋にランダムなエラーであるかどうかを問う。ここでは,モデル解釈性に顕著な成功をおさめた機械学的な研究に目を向ける。強id一般化は構造化表現に起因するが,ood性能の満足度は低いが,モデルには明快な代数的構造が残っている。具体的には、これらのモデルは OOD 入力を ID ドメインで等価な関係を持つ出力にマップする。これらは、一般化を改善するための有用な情報を運ぶモデルの可能性を強調している。

関連論文リスト

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation [50.22361866757033]
統合視覚言語モデル(VLM)は、視覚的理解と生成機能の両方を統合する。本稿では,統一VLMにおける理解・生成タスクの一般化を体系的に検討する。
論文参考訳（メタデータ） (2025-05-29T03:40:21Z)
Analyzing the Inner Workings of Transformers in Compositional Generalization [15.599899071518545]
本稿では,トランスフォーマーモデルの内部動作について,一般化性能に寄与する既存のサブネットワークを見つけることによって検討する。モデルが正しい解を出力するための構文的特徴に依存していることがわかったが、全体のモデルよりもはるかに優れた一般化性能を持つサブネットワークは非合成アルゴリズムに依存している。
論文参考訳（メタデータ） (2025-02-21T08:07:53Z)
Compositional Generalization Requires More Than Disentangled Representations [5.762286612061953]
作曲の一般化は深層学習の鍵となる課題です多くの生成モデルは、アウト・オブ・ディストリビューション(OOD)サンプルを生成する因子を認識し、構成することができない。正規化や訓練データによるアーキテクチャ変更を強制的に行うモデルは,OOD領域の学習において,データ効率が高く,効果的であることを示す。
論文参考訳（メタデータ） (2025-01-30T23:20:41Z)
Out-of-distribution generalization via composition: a lens through induction heads in Transformers [0.46085106405479537]
GPT-4のような大きな言語モデル(LLM)は、しばしば創造的であり、しばしばプロンプトにいくつかのデモがある新しいタスクを解く。これらのタスクは、トレーニングデータとは異なる分布を一般化するモデルを必要とする -- アウト・オブ・ディストリビューション(OOD)一般化(out-of-distribution)と呼ばれる。隠れルールに従ってインスタンスが生成される設定におけるOOD一般化について検討する。
論文参考訳（メタデータ） (2024-08-18T14:52:25Z)
Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks [5.522116934552708]
トランスフォーマーベースのモデルは様々なタスクにおいて優れているが、その一般化能力、特に算術的推論では、まだ完全には理解されていない。本稿では,算術課題における変圧器の一般化動作を理解するための統一的理論枠組みを開発する。
論文参考訳（メタデータ） (2024-07-25T11:35:22Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
本研究では,相互依存データに対する問題に対して,学習可能な分散場を持つ幾何学的拡散モデルを提案する。因果推論によって新たな学習目標が導出され、ドメイン間で無神経な相互依存の一般化可能なパターンを学習するためのモデルが導出される。
論文参考訳（メタデータ） (2024-06-07T14:29:21Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
大規模言語モデル(LLM)に固有の内在的一般化能力に微調整が及ぼす影響について検討する。本研究の主目的は、生成タスクと分類タスクを微調整したモデルが、異なる領域やタスクに一般化する際に異なる振る舞いを示すことである。生成タスクの微調整中にコンテキスト内学習戦略を統合することで、モデルの一般化能力を高めることができる。
論文参考訳（メタデータ） (2024-03-14T08:18:59Z)
Generalization Through the Lens of Learning Dynamics [11.009483845261958]
機械学習(ML)システムは、デプロイ時に正確な予測を得るために、新しい状況に一般化することを学ぶ必要がある。ディープニューラルネットワークの印象的な一般化性能は、理論家たちに悪影響を与えている。この論文は、教師付き学習タスクと強化学習タスクの両方において、ディープニューラルネットワークの学習ダイナミクスを研究する。
論文参考訳（メタデータ） (2022-12-11T00:07:24Z)
On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
In-distriion (ID) と Out-of-distriion (OOD) の相違について考察する。我々は,3つの意味解析データセットを用いて,OPT,BLOOM,CodeGen,Codexの4つのモデルファミリを評価する。
論文参考訳（メタデータ） (2022-11-15T19:56:37Z)
Exploring Length Generalization in Large Language Models [46.417433724786854]
短い問題インスタンスから長い問題インスタンスへ外挿する能力は、推論タスクにおける分配外一般化の重要な形態である。本研究では, モデルスケールによらず, 時間的一般化タスクにおいて, 経時的に微調整されたトランスフォーマが有意な一般化欠陥を示すことを示す。次に,事前学習された大言語モデルのテキスト内学習能力とスクラッチパッドを組み合わせることにより,長さ一般化の劇的な改善が得られたことを示す。
論文参考訳（メタデータ） (2022-07-11T14:24:38Z)
Towards a Theoretical Framework of Out-of-Distribution Generalization [28.490842160921805]
オフ・オブ・ディストリビューション(OOD)データへの一般化(ドメイン一般化)は、現代の機械学習における中心的な問題の一つである。本研究は,OOD問題の厳密かつ定量的な定義に向けての第一歩を踏み出したものである。
論文参考訳（メタデータ） (2021-06-08T16:32:23Z)
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization [93.8373619657239]
SGDで訓練されたニューラルネットワークは最近、線形予測的特徴に優先的に依存することが示された。この単純さバイアスは、分布外堅牢性(OOD)の欠如を説明することができる。単純さのバイアスを軽減し,ood一般化を改善できることを実証する。
論文参考訳（メタデータ） (2021-05-12T12:12:24Z)
Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
オフ・オブ・ディストリビューション(OOD)データへのモデルの一般化は、最近、大きな注目を集めている。合成一般化のための自然なテストベッドである意味解析における合成一般化について検討する。
論文参考訳（メタデータ） (2020-10-12T12:34:58Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。