Fugu-MT 論文翻訳(概要): Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

論文の概要: Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

arxiv url: http://arxiv.org/abs/2603.06228v1
Date: Fri, 06 Mar 2026 12:44:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.70445
Title: Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention
Title（参考訳）: 空間スパース線形アテンションを用いた低レイテンシイベントベース物体検出
Authors: Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubić, Davide Scaramuzza, Wenhui Wang,
Abstract要約: イベントカメラは、空間空間の間隔と高時間分解能を備えたシーケンシャルな視覚データを提供するため、低遅延物体検出には魅力的である。既存の非同期イベントベースのニューラルネットワークは、イベント単位の予測を更新することで、この低レイテンシの利点を実現するが、それでも2つのボトルネックに悩まされている。空間分割型状態分解とスキャッタ・コンピュテート・ガザの訓練手順を導入し,状態レベルの疎性も可能とした空間分割線形注意(SSLA)を提案する。
参考スコア（独自算出の注目度）: 20.653155039432463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event cameras provide sequential visual data with spatial sparsity and high temporal resolution, making them attractive for low-latency object detection. Existing asynchronous event-based neural networks realize this low-latency advantage by updating predictions event-by-event, but still suffer from two bottlenecks: recurrent architectures are difficult to train efficiently on long sequences, and improving accuracy often increases per-event computation and latency. Linear attention is appealing in this setting because it supports parallel training and recurrent inference. However, standard linear attention updates a global state for every event, yielding a poor accuracy-efficiency trade-off, which is problematic for object detection, where fine-grained representations and thus states are preferred. The key challenge is therefore to introduce sparse state activation that exploits event sparsity while preserving efficient parallel training. We propose Spatially-Sparse Linear Attention (SSLA), which introduces a mixture-of-spaces state decomposition and a scatter-compute-gather training procedure, enabling state-level sparsity as well as training parallelism. Built on SSLA, we develop an end-to-end asynchronous linear attention model, SSLA-Det, for event-based object detection. On Gen1 and N-Caltech101, SSLA-Det achieves state-of-the-art accuracy among asynchronous methods, reaching 0.375 mAP and 0.515 mAP, respectively, while reducing per-event computation by more than 20 times compared to the strongest prior asynchronous baseline, demonstrating the potential of linear attention for low-latency event-based vision.
Abstract（参考訳）: イベントカメラは、空間空間の間隔と高時間分解能を備えたシーケンシャルな視覚データを提供し、低遅延オブジェクト検出に魅力的なものとなる。既存の非同期イベントベースのニューラルネットワークは、イベント単位の予測を更新することで、この低レイテンシの利点を実現するが、それでも2つのボトルネックに悩まされている。この設定では、並列トレーニングと反復推論をサポートするため、線形注意が魅力的である。しかし、標準的な線形アテンションは、すべての事象のグローバルな状態を更新し、精度の低いトレードオフをもたらす。したがって、重要な課題は、効率的な並列トレーニングを保ちながらイベントのスパーシティを利用するスパースステートアクティベーションを導入することである。本研究では,空間分割状態分解とスキャッタ・コンピュテート・ガザの訓練手順を導入し,空間分割線形注意(SSLA)を提案する。 SSLAに基づいて構築され、イベントベースのオブジェクト検出のためのエンドツーエンドの非同期線形アテンションモデルであるSSLA-Detを開発した。 Gen1とN-Caltech101では、SSLA-Detは、それぞれ0.375 mAPと0.515 mAPに達し、最強の非同期ベースラインに比べて、イベント単位の計算を20倍以上削減し、低レイテンシのイベントベースビジョンに対する線形アテンションの可能性を示している。

論文の概要: Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

関連論文リスト