Fugu-MT 論文翻訳(概要): Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs

論文の概要: Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs

arxiv url: http://arxiv.org/abs/2402.07321v1
Date: Sun, 11 Feb 2024 22:58:49 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-13 16:21:55.288808
Title: Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Title（参考訳）: ファクトアップ: LLMにおけるFactual Recallの背後にある付加的なメカニズム
Authors: Bilal Chughtai, Alan Cooney, Neel Nanda
Abstract要約: 私たちはこのタスクの最も基本的な形式 – 事実的リコール – に焦点を当てています。事実的リコールの背後にあるメカニスティックなストーリーは、以前考えられていたよりも複雑であることがわかった。
参考スコア（独自算出の注目度）: 1.5571776694273143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How do transformer-based large language models (LLMs) store and retrieve knowledge? We focus on the most basic form of this task -- factual recall, where the model is tasked with explicitly surfacing stored facts in prompts of form `Fact: The Colosseum is in the country of'. We find that the mechanistic story behind factual recall is more complex than previously thought. It comprises several distinct, independent, and qualitatively different mechanisms that additively combine, constructively interfering on the correct attribute. We term this generic phenomena the additive motif: models compute through summing up multiple independent contributions. Each mechanism's contribution may be insufficient alone, but summing results in constructive interfere on the correct answer. In addition, we extend the method of direct logit attribution to attribute an attention head's output to individual source tokens. We use this technique to unpack what we call `mixed heads' -- which are themselves a pair of two separate additive updates from different source tokens.
Abstract（参考訳）: トランスフォーマティブベースの大規模言語モデル(llm)はどうやって知識を格納し、取得するのか? We focus on the most basic form of this task -- factual recall, where the model is tasked with explicitly surfacing stored facts in prompts of form `Fact: The Colosseum is in the country of'. We find that the mechanistic story behind factual recall is more complex than previously thought. It comprises several distinct, independent, and qualitatively different mechanisms that additively combine, constructively interfering on the correct attribute. We term this generic phenomena the additive motif: models compute through summing up multiple independent contributions. Each mechanism's contribution may be insufficient alone, but summing results in constructive interfere on the correct answer. In addition, we extend the method of direct logit attribution to attribute an attention head's output to individual source tokens. We use this technique to unpack what we call `mixed heads' -- which are themselves a pair of two separate additive updates from different source tokens.

関連論文リスト

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries [21.28307426740275]
1対多の事実クエリに答えるためには、言語モデル(LM)が同時に知識をリコールし、以前の回答を繰り返すことを避ける必要がある。複数のデータセットとモデルにまたがって、プロモート・then-suppressメカニズムを特定し、モデルが最初にすべての回答をリコールし、その後に生成されたものを抑圧する。
論文参考訳（メタデータ） (2025-02-27T19:23:15Z)
Counterfactual Generation from Language Models [64.55296662926919]
対実的推論が介入と概念的に異なることを示す。そこで本研究では,真の文字列反事実を生成するためのフレームワークを提案する。我々の実験は、このアプローチが有意義な反事実を生み出すことを示した。
論文参考訳（メタデータ） (2024-11-11T17:57:30Z)
IIU: Independent Inference Units for Knowledge-based Visual Question Answering [7.3787088958663665]
細粒度マルチモーダル推論のための独立推論ユニット(IIU)を提案する。 IIUは機能的に独立したユニットによってモジュール内情報を分解する。我々のモデルは、新しい最先端を実現し、性能を3%向上し、基礎的な事前訓練されたマルチモーダルモデルを超えた。
論文参考訳（メタデータ） (2024-08-15T07:30:47Z)
Beyond Single-Feature Importance with ICECREAM [0.4970364068620607]
本稿では,変数の連立が対象変数の分布に与える影響に関する情報理論尺度を提案する。合成および実世界のデータを用いた実験では、ICECREAMは説明可能性や根本原因分析において最先端の手法よりも優れていることを示す。
論文参考訳（メタデータ） (2023-07-19T06:48:33Z)
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification [116.55145773123132]
我々は、人間がマルチモーダル相互作用の2つの分類に注釈を付ける方法を研究する。本稿では,部分的および対実的ラベルのアノテーションを情報分解に自動的に変換する手法を提案する。
論文参考訳（メタデータ） (2023-06-07T03:44:50Z)
EquiMod: An Equivariance Module to Improve Self-Supervised Learning [77.34726150561087]
自己教師付き視覚表現法は教師付き学習性能とのギャップを埋めている。これらの手法は、データ拡張によって生成された関連する合成入力の埋め込みの類似性を最大化することに依存する。学習された潜在空間を構成する一般同値加群であるEquiModを導入する。
論文参考訳（メタデータ） (2022-11-02T16:25:54Z)
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation [0.0]
MultiformerはTransformerベースのモデルであり、各ヘッドに異なるアテンションメカニズムを使用することができる。これを行うことで、モデルはより多様なトークン相互作用の抽出に自己注意を偏らせることができる。その結果、異なる頭部と層に沿った注意パターンの混合は、我々の基準線を最大0.7BLEUで上回ることがわかった。
論文参考訳（メタデータ） (2022-05-14T17:37:47Z)
Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
隠れた表現とパラメータを複数のメカニズムに分割し、注意を通して情報を交換する新しいトランスフォーマー層を提案する。 TIM を大規模 BERT モデル、画像変換器、および音声強調について研究し、意味的に意味のある専門化とパフォーマンスの向上の証拠を見つけます。
論文参考訳（メタデータ） (2021-02-27T21:48:46Z)
Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation [97.22768624862111]
生成プロセスに対するソースとターゲットの相対的コントリビューションを明確に評価するNMTモデルを解析する。より多くのデータでトレーニングされたモデルは、ソース情報に依存しやすく、よりシャープなトークンコントリビューションを持つ傾向にあります。
論文参考訳（メタデータ） (2020-10-21T11:37:27Z)
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention [37.111204321059084]
両方向変換器に基づく単語と実体の事前学習した文脈表現を提案する。我々のモデルは、BERTのマスキング言語モデルに基づく新しい事前訓練タスクを用いて訓練される。また,変換器の自己認識機構の拡張である自己認識機構を提案する。
論文参考訳（メタデータ） (2020-10-02T15:38:03Z)
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [89.21584915290319]
本稿では,トランスフォーマー内の情報相互作用を解釈する自己帰属属性法を提案する。本研究は,BERT に対する非目標攻撃の実装において,その属性を敵対パターンとして用いることができることを示す。
論文参考訳（メタデータ） (2020-04-23T14:58:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。