Fugu-MT 論文翻訳(概要): Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

論文の概要: Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

arxiv url: http://arxiv.org/abs/2605.20839v1
Date: Wed, 20 May 2026 07:29:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.555508
Title: Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
Title（参考訳）: 画像認識のためのアクティベーションフリーバックボーン:メタホルマー型視覚モデル内のポリノミアル代替品
Authors: Jeffrey Wang, Jonathan Gregory, Grigorios G. Chrysos,
Abstract要約: アダマール積は入力の関数を生成するために標準的な非線形性を置き換えることを示す。これらのモジュールは既存のアーキテクチャとシームレスに統合される。また、ImageNet分類、ADE20Kセグメンテーション、アウト・オブ・ディストリビューション・セマンティック・モデルスケールにおいて、先行ネットワークよりも大幅に優れています。
参考スコア（独自算出の注目度）: 6.908818193023836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern vision backbones treat pointwise activations (e.g., ReLU, GELU) and exponential softmax as essential sources of nonlinearity, but we demonstrate they are not required within MetaFormer-style vision backbones. We design activation-free polynomial alternatives for three core primitives (MLPs, convolutions, and attention), where Hadamard products replace standard nonlinearities to yield polynomial functions of the input. These modules integrate seamlessly into existing architectures: instantiated within MetaFormer, a modular framework for vision backbones, our PolyNeXt models match or exceed activation-based counterparts across model scales on ImageNet classification, ADE20K semantic segmentation, and out-of-distribution robustness. We also substantially outperform prior polynomial networks at reduced computational cost, showing that polynomial variants of standard modules beat complex custom architectures.
Abstract（参考訳）: 現代の視覚バックボーンは、ポイントワイズ活性化(例えば、ReLU、GELU)と指数的ソフトマックスを非線形性の必須源として扱うが、メタホルマー型視覚バックボーンでは不要であることを示す。我々は、3つのコアプリミティブ(MLP、畳み込み、注意)に対するアクティベーションフリー多項式代替品を設計し、そこではアダマール積が標準非線形性を置き換えて入力の多項式関数を生成する。ビジョンバックボーン用のモジュールフレームワークであるMetaFormer内でインスタンス化され、私たちのPolyNeXtモデルは、ImageNet分類、ADE20Kセマンティックセグメンテーション、アウト・オブ・ディストリビューション・ロバストネスのモデルスケールで、アクティベーションベースのモデルスケールにマッチするか、超えます。また,計算コストの削減による事前多項式ネットワークの性能も大幅に向上し,標準モジュールの多項式変種が複雑なカスタムアーキテクチャを上回ることを示した。

論文の概要: Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

関連論文リスト