Fugu-MT 論文翻訳(概要): Vision Hopfield Memory Networks

論文の概要: Vision Hopfield Memory Networks

arxiv url: http://arxiv.org/abs/2603.25157v1
Date: Thu, 26 Mar 2026 08:23:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.180035
Title: Vision Hopfield Memory Networks
Title（参考訳）: Vision Hopfield Memory Networks
Authors: Jianfeng Wang, Amine M'Charrak, Luk Koska, Xiangtao Wang, Daniel Petriceanu, Mykyta Smyrnov, Ruizhi Wang, Michael Bumbar, Luca Pinchetti, Thomas Lukasiewicz,
Abstract要約: Vision Hopfield Memory Network (V-HMN) は、階層型メモリ機構と反復的リフレッシュメント更新を統合した、脳にインスパイアされたファンデーションバックボーンである。 V-HMNは、ローカルとグローバルの両方のダイナミクスを統一されたフレームワークでキャプチャする。メモリ検索は入力とストアドパターンの関係を公開し、決定をより解釈可能にします。
参考スコア（独自算出の注目度）: 43.727500835033986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent vision and multimodal foundation backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress, enabling unified modeling across images, text, and beyond. Despite their empirical success, these architectures remain far from the computational principles of the human brain, often demanding enormous amounts of training data while offering limited interpretability. In this work, we propose the Vision Hopfield Memory Network (V-HMN), a brain-inspired foundation backbone that integrates hierarchical memory mechanisms with iterative refinement updates. Specifically, V-HMN incorporates local Hopfield modules that provide associative memory dynamics at the image patch level, global Hopfield modules that function as episodic memory for contextual modulation, and a predictive-coding-inspired refinement rule for iterative error correction. By organizing these memory-based modules hierarchically, V-HMN captures both local and global dynamics in a unified framework. Memory retrieval exposes the relationship between inputs and stored patterns, making decisions more interpretable, while the reuse of stored patterns improves data efficiency. This brain-inspired design therefore enhances interpretability and data efficiency beyond existing self-attention- or state-space-based approaches. We conducted extensive experiments on public computer vision benchmarks, and V-HMN achieved competitive results against widely adopted backbone architectures, while offering better interpretability, higher data efficiency, and stronger biological plausibility. These findings highlight the potential of V-HMN to serve as a next-generation vision foundation model, while also providing a generalizable blueprint for multimodal backbones in domains such as text and audio, thereby bridging brain-inspired computation with large-scale machine learning.
Abstract（参考訳）: トランスフォーマーファミリーやMambaのようなステートスペースモデルのような最近のビジョンとマルチモーダル基盤のバックボーンは、画像やテキストなどにわたって統一されたモデリングを可能にするなど、目覚ましい進歩を遂げている。実証的な成功にもかかわらず、これらのアーキテクチャは人間の脳の計算原理とはかけ離れたままであり、しばしば膨大な量のトレーニングデータを必要としながら、限定的な解釈可能性を提供している。本研究では、階層型メモリ機構と反復的リフレッシュメント更新を統合した脳にインスパイアされたファンデーションバックボーンであるVision Hopfield Memory Network (V-HMN)を提案する。具体的には、画像パッチレベルで連想メモリのダイナミクスを提供するローカルホップフィールドモジュール、文脈変調のためのエピソードメモリとして機能するグローバルホップフィールドモジュール、反復誤り訂正のための予測符号化インスパイアされた洗練されたルールを組み込んでいる。これらのメモリベースのモジュールを階層的に構成することにより、V-HMNは、ローカルとグローバルの両方のダイナミクスを統一されたフレームワークでキャプチャする。メモリ検索は入力とストアドパターンの関係を公開し、決定をより解釈可能とし、ストアドパターンの再利用によりデータの効率が向上する。この脳にインスパイアされたデザインは、既存の自己意識や状態空間に基づくアプローチを超えて、解釈可能性とデータの効率を高める。 V-HMNは広く採用されているバックボーンアーキテクチャに対して高い解釈性、高いデータ効率、より強力な生物学的信頼性を提供しながら、幅広い実験を行った。これらの知見は、V-HMNが次世代のビジョン基盤モデルとして機能する可能性を強調し、テキストやオーディオなどの領域におけるマルチモーダルバックボーンのための一般化可能なブループリントを提供し、大規模な機械学習で脳にインスパイアされた計算をブリッジする。

論文の概要: Vision Hopfield Memory Networks

関連論文リスト