Fugu-MT 論文翻訳(概要): PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection

論文の概要: PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection

arxiv url: http://arxiv.org/abs/2603.16341v1
Date: Tue, 17 Mar 2026 10:17:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.216298
Title: PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection
Title（参考訳）: PKINet-v2: 強力かつ効率的な多カーネルリモートセンシングオブジェクト検出を目指して
Authors: Xinhao Cai, Liulei Li, Gensheng Pei, Zeren Sun, Yazhou Yao, Wenguan Wang,
Abstract要約: PKINet-v2は異方性軸ストリップ畳み込みと等方性正方核を共役し、マルチスコープの受容場を構築する。 PKINet-v2 は PKINet-v1 と比較して$bf3.9times$ FPS の高速化を実現している。
参考スコア（独自算出の注目度）: 73.05827565991488
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Object detection in remote sensing images (RSIs) is challenged by the coexistence of geometric and spatial complexity: targets may appear with diverse aspect ratios, while spanning a wide range of object sizes under varied contexts. Existing RSI backbones address the two challenges separately, either by adopting anisotropic strip kernels to model slender targets or by using isotropic large kernels to capture broader context. However, such isolated treatments lead to complementary drawbacks: the strip-only design can disrupt spatial coherence for regular-shaped objects and weaken tiny details, whereas isotropic large kernels often introduce severe background noise and geometric mismatch for slender structures. In this paper, we extend PKINet, and present a powerful and efficient backbone that jointly handles both challenges within a unified paradigm named Poly Kernel Inception Network v2 (PKINet-v2). PKINet-v2 synergizes anisotropic axial-strip convolutions with isotropic square kernels and builds a multi-scope receptive field, preserving fine-grained local textures while progressively aggregating long-range context across scales. To enable efficient deployment, we further introduce a Heterogeneous Kernel Re-parameterization (HKR) Strategy that fuses all heterogeneous branches into a single depth-wise convolution for inference, eliminating fragmented kernel launches without accuracy loss. Extensive experiments on four widely-used benchmarks, including DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R, demonstrate that PKINet-v2 achieves state-of-the-art accuracy while delivering a $\textbf{3.9}\times$ FPS acceleration compared to PKINet-v1, surpassing previous remote sensing backbones in both effectiveness and efficiency.
Abstract（参考訳）: リモートセンシング画像(RSI)における物体検出は、幾何学的および空間的複雑さの共存により、課題となる。既存のRSIバックボーンは、細いターゲットをモデル化するために異方性ストリップカーネルを採用するか、より広いコンテキストをキャプチャするために異方性の大きなカーネルを使用する。ストリップのみの設計は、通常の形状の物体の空間的コヒーレンスを妨害し、細部を弱めるが、等方性の大きなカーネルは、しばしば強烈な背景ノイズと幾何的ミスマッチを細い構造物に導入する。本稿では、PKINetを拡張し、Poly Kernel Inception Network v2(PKINet-v2)と呼ばれる統一パラダイム内で、両方の課題を共同で処理する、強力で効率的なバックボーンを提案する。 PKINet-v2は異方性軸ストリップ畳み込みを等方性正方核と相乗し、マルチスコープの受容場を構築し、粒度の細かい局所的なテクスチャを保持しながら、スケールの長いコンテキストを徐々に集約する。効率的なデプロイを実現するため,不均一な分岐を1つの深さワイドな畳み込みに融合し,断片化されたカーネルの起動を精度の低下なく除去するヘテロジニアスカーネル再パラメータ化(HKR)戦略を導入する。 DOTA-v1.0、DOTA-v1.5、HRSC2016、DIOR-Rを含む4つの広く使われているベンチマークの広範な実験は、PKINet-v2が最先端の精度を達成し、PKINet-v1と比較して$\textbf{3.9}\times$ FPSの加速を実現し、効果と効率の両方において従来のリモートリモートバックボーンを上回ることを実証している。

論文の概要: PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection

関連論文リスト