Fugu-MT 論文翻訳(概要): HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

論文の概要: HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

arxiv url: http://arxiv.org/abs/2605.23889v1
Date: Fri, 22 May 2026 17:50:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.456609
Title: HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction
Title（参考訳）: HorizonStream:3Dレコンストラクションストリーミングのための長距離アテンション
Authors: Chong Cheng, Peilin Tao, Nanjie Yao, Guanzhi Ding, Xianda Chen, Yuansen Du, Xiaoyang Guo, Wei Yin, Weiqiang Ren, Qian Zhang, Zhengqing Chen, Hao Wang,
Abstract要約: 既存の方法は、しばしば長いシーケンスでドリフト、ジッター、崩壊に悩まされる。我々は,このカーネルを明示的に分解する長い水平変換器であるHorizonStreamを提案する。実験により,HorizonStreamは定数メモリと線形時間で1万フレームを超えるシーケンスに安定に一般化できることが示された。
参考スコア（独自算出の注目度）: 18.749241400724493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently temporally heterogeneous, with evidence ranging from short-lived correspondences to persistent global scale. However, current architectures impose uniform and pathological influence patterns. For example, sliding windows enforce hard cutoffs, while ungated recurrence and causal attention cause cache saturation and spike-like attention sinks. To resolve this, we formalize geometric propagation as an \emph{evidence influence kernel} and propose HorizonStream, a long-horizon Transformer that explicitly factorizes this kernel. For the long-range temporal factor, Geometric Linear Attention learns channel-wise decay rates to enable bounded, multi-timescale propagation of geometric evidence. For the short-range spatial factor, Geometric Local Attention with Spatiotemporal RoPE performs reliable 3D matching while suppressing attention sinks. Finally, Metric Readout Tokens recover stable scale and rigid pose directly from the persistent geometric state. Extensive experiments show that HorizonStream, trained on only 48-frame clips, generalizes stably to sequences exceeding 10,000\ frames with constant memory and linear time, achieving state-of-the-art streaming 3D reconstruction performance. Project Page: https://3dagentworld.github.io/horizonstream/
Abstract（参考訳）: オンライン3D再構成では、厳密な因果制約と境界メモリ制約の下でカメラポーズとシーン形状を推定する必要がある。既存の方法は、しばしば長いシーケンスでドリフト、ジッター、崩壊に悩まされる。これらの失敗を基本的なミスマッチにトレースします。ストリーミング幾何学は本質的に時間的に異質であり、短命な対応から永続的なグローバルスケールまで幅広い証拠がある。しかし、現在の建築は、一様かつ病理的な影響のパターンを課している。例えば、スライディングウィンドウはハードカットを強制する一方、無言の再発と因果的注意はキャッシュ飽和とスパイクのような注意シンクを引き起こす。これを解決するために、幾何伝播を \emph{evidence influence kernel} として形式化し、このカーネルを明示的に分解する長い水平変換器であるHorizonStreamを提案する。長距離時間係数について、幾何学的線形注意(Geometric Linear Attention)は、チャネルワイズ崩壊率を学習し、幾何学的証拠の有界多時間伝播を可能にする。短距離空間係数に対して、時空間RoPEを用いた幾何学的局所注意は、注意シンクを抑制しながら信頼性の高い3次元マッチングを行う。最後に、Metric Readout Tokensは安定したスケールと厳密なポーズを永続的な幾何学状態から直接回復する。大規模な実験により、48フレームのクリップでトレーニングされたHorizonStreamは、一定メモリと線形時間で10,000\のフレームを超えるシーケンスを安定して一般化し、最先端のストリーミング3D再構成性能を実現している。 Project Page: https://3dagentworld.github.io/Horizonstream/

論文の概要: HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

関連論文リスト