Fugu-MT 論文翻訳(概要): HAViT: Historical Attention Vision Transformer

論文の概要: HAViT: Historical Attention Vision Transformer

arxiv url: http://arxiv.org/abs/2603.18585v1
Date: Thu, 19 Mar 2026 07:46:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.015778
Title: HAViT: Historical Attention Vision Transformer
Title（参考訳）: HAViT: 歴史的注意力変換器
Authors: Swarnendu Banik, Manish Das, Shiv Ram Dubey, Satish Kumar Singh,
Abstract要約: 視覚変換器はコンピュータビジョンに優れているが、その注意機構は層間で独立して動作する。本稿では,過去の注目行列を保存・統合する多層アテンション伝搬法を提案する。このアプローチは、トランスフォーマー階層全体にわたる注意パターンの進歩的な洗練を可能にする。
参考スコア（独自算出の注目度）: 7.419725234099727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformers have excelled in computer vision but their attention mechanisms operate independently across layers, limiting information flow and feature learning. We propose an effective cross-layer attention propagation method that preserves and integrates historical attention matrices across encoder layers, offering a principled refinement of inter-layer information flow in Vision Transformers. This approach enables progressive refinement of attention patterns throughout the transformer hierarchy, enhancing feature acquisition and optimization dynamics. The method requires minimal architectural changes, adding only attention matrix storage and blending operations. Comprehensive experiments on CIFAR-100 and TinyImageNet demonstrate consistent accuracy improvements, with ViT performance increasing from 75.74% to 77.07% on CIFAR-100 (+1.33%) and from 57.82% to 59.07% on TinyImageNet (+1.25%). Cross-architecture validation shows similar gains across transformer variants, with CaiT showing 1.01% enhancement. Systematic analysis identifies the blending hyperparameter of historical attention (alpha = 0.45) as optimal across all configurations, providing the ideal balance between current and historical attention information. Random initialization consistently outperforms zero initialization, indicating that diverse initial attention patterns accelerate convergence and improve final performance. Our code is publicly available at https://github.com/banik-s/HAViT.
Abstract（参考訳）: 視覚変換器はコンピュータビジョンに優れているが、その注意機構は、情報フローや特徴学習を制限し、レイヤー間で独立して機能する。エンコーダ層にまたがる過去の注目行列を保存・統合し,視覚変換器における層間情報フローの原理的改善を実現する,効果的な層間アテンション伝搬手法を提案する。このアプローチにより、トランスフォーマー階層全体の注意パターンの進歩的な改善が可能になり、機能獲得と最適化のダイナミクスが強化される。この方法は最小限のアーキテクチャ変更を必要とし、注意マトリックスストレージとブレンディング操作のみを追加する。 CIFAR-100とTinyImageNetの総合的な実験では、CIFAR-100では75.74%から77.07%(+1.33%)、TinyImageNetでは57.82%から59.07%(+1.25%)に向上した。クロスアーキテクチャ検証は変圧器の変種間でも同様の利得を示し、CaiTは1.01%の強化を示している。システム分析は、歴史的注意の混合ハイパーパラメータ(alpha = 0.45)を全ての構成で最適とし、現在の注意情報と歴史的注意情報の理想的なバランスを与える。ランダム初期化はゼロ初期化を一貫して上回り、様々な初期注意パターンが収束を加速し、最終的なパフォーマンスを改善することを示す。私たちのコードはhttps://github.com/banik-s/HAViT.comで公開されています。

論文の概要: HAViT: Historical Attention Vision Transformer

関連論文リスト