Fugu-MT 論文翻訳(概要): AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

論文の概要: AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

arxiv url: http://arxiv.org/abs/2603.10438v1
Date: Wed, 11 Mar 2026 05:40:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.794015
Title: AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory
Title（参考訳）: AsyncMDE:非同期空間記憶によるリアルタイム単眼深度推定
Authors: Lianjie Ma, Yuquan Li, Bingzheng Jiang, Ziming Zhong, Han Ding, Lijun Zhu,
Abstract要約: AsyncMDEは、ファンデーションモデルの計算コストを時間とともに補正する非同期深度認識システムである。屋内の静的、動的、合成的な極端運動ベンチマークにまたがって検証される。 AsyncMDEはリフレッシュの間を優雅に分解し、Jetson AGX Orin withRTで161FPSを達成する。
参考スコア（独自算出の注目度）: 5.4678854145519855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundation-model-based monocular depth estimation offers a viable alternative to active sensors for robot perception, yet its computational cost often prohibits deployment on edge platforms. Existing methods perform independent per-frame inference, wasting the substantial computational redundancy between adjacent viewpoints in continuous robot operation. This paper presents AsyncMDE, an asynchronous depth perception system consisting of a foundation model and a lightweight model that amortizes the foundation model's computational cost over time. The foundation model produces high-quality spatial features in the background, while the lightweight model runs asynchronously in the foreground, fusing cached memory with current observations through complementary fusion, outputting depth estimates, and autoregressively updating the memory. This enables cross-frame feature reuse with bounded accuracy degradation. At a mere 3.83M parameters, it operates at 237 FPS on an RTX 4090, recovering 77% of the accuracy gap to the foundation model while achieving a 25X parameter reduction. Validated across indoor static, dynamic, and synthetic extreme-motion benchmarks, AsyncMDE degrades gracefully between refreshes and achieves 161FPS on a Jetson AGX Orin with TensorRT, clearly demonstrating its feasibility for real-time edge deployment.
Abstract（参考訳）: ファンデーションモデルに基づく単眼深度推定は、ロボット知覚のためのアクティブセンサーに代わる実行可能な代替手段を提供するが、その計算コストは、しばしばエッジプラットフォームへの展開を禁止している。既存の手法はフレーム単位の独立推論を行い、連続ロボット操作において、隣接する視点間のかなりの計算冗長性を浪費する。本稿では,ファンデーションモデルと軽量モデルからなる非同期深度認識システムであるAsyncMDEについて述べる。基礎モデルはバックグラウンドで高品質な空間特性を生成し、軽量モデルはフォアグラウンドで非同期に動作し、キャッシュされたメモリを相補的融合によって現在の観測と融合させ、深さ推定を出力し、メモリを自動回帰的に更新する。これにより、境界精度の低下を伴うクロスフレーム機能の再利用が可能になる。わずか3.83MのパラメータでRTX 4090上で237 FPSで動作し、25Xパラメータ還元を達成しながら基礎モデルの精度ギャップの77%を回復する。 AsyncMDEは、屋内の静的、動的、合成的な極端運動ベンチマークで検証され、リフレッシュの間を優雅に分解し、TensorRTを備えたJetson AGX Orinで161FPSを達成する。

論文の概要: AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

関連論文リスト