Fugu-MT 論文翻訳(概要): Fractal Autoregressive Depth Estimation with Continuous Token Diffusion

論文の概要: Fractal Autoregressive Depth Estimation with Continuous Token Diffusion

arxiv url: http://arxiv.org/abs/2603.14702v1
Date: Mon, 16 Mar 2026 01:19:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.977087
Title: Fractal Autoregressive Depth Estimation with Continuous Token Diffusion
Title（参考訳）: 連続的トークン拡散を用いたフラクタル自己回帰深さ推定
Authors: Jinchang Zhang, Xinrou Kang, Guoyu Lu,
Abstract要約: 粗大なプロセスとして深度推定を再構成するフラクタルビジュアル自己回帰拡散フレームワークを提案する。条件付き拡散損失モデルが連続空間に直接分布し、離散量子化による誤差を緩和する。標準ベンチマークの実験では、高い性能を示し、提案した設計の有効性を検証する。
参考スコア（独自算出の注目度）: 24.726638033402747
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monocular depth estimation can benefit from autoregressive (AR) generation, but direct AR modeling is hindered by the modality gap between RGB and depth, inefficient pixel-wise generation, and instability in continuous depth prediction. We propose a Fractal Visual Autoregressive Diffusion framework that reformulates depth estimation as a coarse-to-fine, next-scale autoregressive generation process. A VCFR module fuses multi-scale image features with current depth predictions to improve cross-modal conditioning, while a conditional denoising diffusion loss models depth distributions directly in continuous space and mitigates errors caused by discrete quantization. To improve computational efficiency, we organize the scale-wise generators into a fractal recursive architecture, reusing a base visual AR unit in a self-similar hierarchy. We further introduce an uncertainty-aware robust consensus aggregation scheme for multi-sample inference to improve fusion stability and provide a practical pixel-wise reliability estimate. Experiments on standard benchmarks demonstrate strong performance and validate the effectiveness of the proposed design.
Abstract（参考訳）: 単眼深度推定は自己回帰(AR)生成の恩恵を受けるが、直接ARモデリングはRGBと深度とのモジュラリティギャップ、非効率な画素ワイド生成、連続深度予測の不安定性によって妨げられる。粗大な自己回帰生成プロセスとして深度推定を再構成するフラクタルビジュアル自己回帰拡散フレームワークを提案する。 VCFRモジュールは、マルチスケール画像特徴と現在の深さ予測を融合して、クロスモーダル条件を改善する一方、条件付き拡散損失モデルにより、連続空間内で直接に分布し、離散量子化による誤差を軽減する。計算効率を向上させるため,大規模生成装置をフラクタル再帰アーキテクチャに編成し,ベースビジュアルARユニットを自己相似階層で再利用する。さらに,マルチサンプル推論のための不確実性を考慮したロバスト・コンセンサス・アグリゲーション・アグリゲーション・スキームを導入し,融合安定性を向上し,実用的な画素単位の信頼度推定を行う。標準ベンチマークの実験では、高い性能を示し、提案した設計の有効性を検証する。

論文の概要: Fractal Autoregressive Depth Estimation with Continuous Token Diffusion

関連論文リスト