Fugu-MT 論文翻訳(概要): Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

論文の概要: Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

arxiv url: http://arxiv.org/abs/2603.06738v1
Date: Fri, 06 Mar 2026 07:47:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.009251
Title: Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention
Title（参考訳）: ランクファクタライズされたインシシトリニューラルバイアス:FlashAttentionによる超解像変換器のスケーリング
Authors: Dongheon Lee, Seokju Yun, Jaegyun Im, Youngmin Ro,
Abstract要約: 超解法(SR)法は主にトランスフォーマーを強力な長距離モデリング能力と例外的な表現能力に採用している。ほとんどのSRトランスフォーマーは相対的位置バイアス(RPB)に大きく依存しているため、FlashAttentionのようなハードウェア効率の高い注目カーネルを利用できない。 SRトランスフォーマにおけるFlashAttentionを可能にするRTBの代替として、ランク分解型インプリシットニューラル(RIB)を提案する。
参考スコア（独自算出の注目度）: 13.594079919865893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent Super-Resolution~(SR) methods mainly adopt Transformers for their strong long-range modeling capability and exceptional representational capacity. However, most SR Transformers rely heavily on relative positional bias~(RPB), which prevents them from leveraging hardware-efficient attention kernels such as FlashAttention. This limitation imposes a prohibitive computational burden during both training and inference, severely restricting attempts to scale SR Transformers by enlarging the training patch size or the self-attention window. Consequently, unlike other domains that actively exploit the inherent scalability of Transformers, SR Transformers remain heavily focused on effectively utilizing limited receptive fields. In this paper, we propose Rank-factorized Implicit Neural Bias~(RIB), an alternative to RPB that enables FlashAttention in SR Transformers. Specifically, RIB approximates positional bias using low-rank implicit neural representations and concatenates them with pixel content tokens in a channel-wise manner, turning the element-wise bias addition in attention score computation into a dot-product operation. Further, we introduce a convolutional local attention and a cyclic window strategy to fully leverage the advantages of long-range interactions enabled by RIB and FlashAttention. We enlarge the window size up to \textbf{96$\times$96} while jointly scaling the training patch size and the dataset size, maximizing the benefits of Transformers in the SR task. As a result, our network achieves \textbf{35.63\,dB PSNR} on Urban100$\times$2, while reducing training and inference time by \textbf{2.1$\times$} and \textbf{2.9$\times$}, respectively, compared to the RPB-based SR Transformer~(PFT).
Abstract（参考訳）: 最近の超解法~(SR)法は主にトランスフォーマーを強力な長距離モデリング能力と例外的な表現能力に採用している。しかし、ほとんどのSRトランスフォーマーは相対的な位置バイアス~(RPB)に大きく依存しているため、FlashAttentionのようなハードウェア効率の良いアテンションカーネルを利用できない。この制限は、トレーニングパッチサイズや自己注意ウィンドウを大きくすることで、SRトランスフォーマーのスケールを厳しく制限する。したがって、トランスフォーマーの固有のスケーラビリティを積極的に活用する他のドメインとは異なり、SRトランスフォーマーは制限された受容場を効果的に活用することに重点を置いている。本稿では,SRトランスフォーマにおけるFlashAttentionを実現するRPBの代替として,Ranc-factorized Implicit Neural Bias~(RIB)を提案する。具体的には、RIBは低ランク暗黙のニューラル表現を用いて位置バイアスを近似し、それらをチャンネルワイドな方法で画素コンテンツトークンと結合し、注目スコア計算における要素単位のバイアス加算をドット積演算に変換する。さらに,RIBとFlashAttentionによって実現された長距離インタラクションの利点をフル活用するために,畳み込みローカルアテンションとサイクリックウィンドウ戦略を導入する。ウィンドウサイズを \textbf{96$\times$96} まで拡大するとともに、トレーニングパッチサイズとデータセットサイズを共同でスケーリングし、SRタスクにおけるTransformerのメリットを最大化する。その結果、我々のネットワークはUrban100$\times$2で \textbf{35.63\,dB PSNR} を達成し、RPBベースのSR変換器~(PFT)と比較して、トレーニング時間と推論時間を \textbf{2.1$\times$} と \textbf{2.9$\times$} で削減した。

論文の概要: Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

関連論文リスト