Fugu-MT 論文翻訳(概要): Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning

論文の概要: Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning

arxiv url: http://arxiv.org/abs/2605.25799v1
Date: Mon, 25 May 2026 12:49:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.049695
Title: Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning
Title（参考訳）: ソースレスクロスドメインFew-Shot学習における注意シンクの改善
Authors: Shuai Yi, Yixiong Zou, Yuhua Li, Ruixuan Li,
Abstract要約: Cross-Domain Few-Shot Learningは、ソースドメイン情報を少ないトレーニングデータでターゲットドメインに転送することができる。標準のターゲットドメイン数発の微調整は注意シンク問題を悪化させ、クラス間での識別性が低下する。本稿では,ターゲットドメインの微調整において,ターゲットドメインクラスとの関係に応じて動的に再重み付けを行う手法を提案する。
参考スコア（独自算出の注目度）: 25.20062959668559
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models (VLMs) like CLIP have shown impressive generalization capabilities, yet their potential for Cross-Domain Few-Shot Learning (CDFSL) remains underexplored, where the model needs to transfer source-domain information to target domains with scarce training data. While the attention sink phenomenon has been observed in VLMs for certain tasks, its role in CDFSL scenarios has not been studied. In this paper, we uncover a critical issue overlooked by prior works: standard target-domain few-shot fine-tuning in CDFSL significantly exacerbates the attention sink problem, leading to poor discriminability across classes. To understand this phenomenon, through extensive experiments, we interpret it as the model's shortcut learning for domain adaptation: to overcome the huge domain gap between the source and target domains, the model shows a high tendency to push tokens that are initially closer to target-domain classes (i.e., simple tokens) to be even closer to these classes, exacerbating the attention sink and wasting the capability of learning other discriminative but initially further tokens (i.e., hard tokens). To address this, we propose a novel approach to dynamically re-weight tokens according to their relevance with target-domain classes during the target-domain finetuning, which explicitly suppresses the model's reliance on these simple tokens and enhances the learning of hard tokens, reducing sink tokens and enhancing discriminability. Extensive experiments on four benchmark datasets validate the rationale of our method, demonstrating new state-of-the-art performance. Our codes are available at https://github.com/shuaiyi308/TIR.
Abstract（参考訳）: CLIPのような視覚言語モデル(VLM)は印象的な一般化能力を示しているが、クロスドメインのFew-Shot Learning(CDFSL)の可能性はまだ検討されていない。注意シンク現象は、特定のタスクのためにVLMで観測されているが、CDFSLシナリオにおけるその役割は研究されていない。本稿では,CDFSLにおける標準目標ドメイン数ショットの微調整が注意シンク問題を著しく悪化させ,クラス間の識別性が低下する,という,従来の研究で見過ごされた重大な問題を明らかにする。この現象を理解するために、我々はドメイン適応のためのモデルのショートカット学習としてこれを解釈する: ソースとターゲットドメインの間の大きなドメインギャップを克服するために、このモデルは、最初にターゲットドメインクラスに近づいたトークン(すなわち単純なトークン)をこれらのクラスにさらに近づく傾向を示し、注意シンクを悪化させ、他の差別的だが初期はそれ以上のトークン(すなわちハードトークン)を学習する能力を無駄にする。そこで本研究では,これらの単純なトークンへの依存を明示的に抑制し,ハードトークンの学習を強化し,シンクトークンを削減し,識別性を向上する,ターゲットドメインの微調整におけるターゲットドメインクラスとの関係に応じて,動的に再重み付きトークンを動的に再重み付けする手法を提案する。 4つのベンチマークデータセットの大規模な実験により、我々の手法の理論的根拠が検証され、新しい最先端性能が実証された。私たちのコードはhttps://github.com/shuaiyi308/TIRで公開されています。

論文の概要: Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning

関連論文リスト