Fugu-MT 論文翻訳(概要): SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding

論文の概要: SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding

arxiv url: http://arxiv.org/abs/2510.12749v1
Date: Tue, 14 Oct 2025 17:28:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.419283
Title: SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding
Title（参考訳）: SportS: 都市景観理解のための同時パノプティカルオドメトリー, レンダリング, 追跡, セグメンテーション
Authors: Zhiliu Yang, Jinyu Dai, Jianyuan Zhang, Zhu Yang,
Abstract要約: 本稿では,全体像理解のための新しいフレームワーク SPORTS を提案する。 Video Panoptic (VPS)、Visual Odometry (VO)、Scene Renderingタスクを反復的で統一された視点に統合する。我々の注意に基づく特徴融合は、計測、追跡、セグメンテーション、新しいビュータスクにおいて、既存の最先端の合成方法よりも優れています。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The scene perception, understanding, and simulation are fundamental techniques for embodied-AI agents, while existing solutions are still prone to segmentation deficiency, dynamic objects' interference, sensor data sparsity, and view-limitation problems. This paper proposes a novel framework, named SPORTS, for holistic scene understanding via tightly integrating Video Panoptic Segmentation (VPS), Visual Odometry (VO), and Scene Rendering (SR) tasks into an iterative and unified perspective. Firstly, VPS designs an adaptive attention-based geometric fusion mechanism to align cross-frame features via enrolling the pose, depth, and optical flow modality, which automatically adjust feature maps for different decoding stages. And a post-matching strategy is integrated to improve identities tracking. In VO, panoptic segmentation results from VPS are combined with the optical flow map to improve the confidence estimation of dynamic objects, which enhances the accuracy of the camera pose estimation and completeness of the depth map generation via the learning-based paradigm. Furthermore, the point-based rendering of SR is beneficial from VO, transforming sparse point clouds into neural fields to synthesize high-fidelity RGB views and twin panoptic views. Extensive experiments on three public datasets demonstrate that our attention-based feature fusion outperforms most existing state-of-the-art methods on the odometry, tracking, segmentation, and novel view synthesis tasks.
Abstract（参考訳）: シーン認識、理解、シミュレーションはAIエージェントの基本的な技術であるが、既存のソリューションはまだセグメンテーションの欠如、ダイナミックオブジェクトの干渉、センサーデータの分散、ビューリミテーションの問題に悩まされている。本稿では,ビデオパノプティクス・セグメンテーション(VPS),ビジュアルオドメトリー(VO),シーンレンダリング(SR)タスクを反復的かつ統一的な視点に統合することで,全体像理解のための新しいフレームワークであるSPORTSを提案する。第一に、VPSはアダプティブアテンションベースの幾何融合機構を設計し、ポーズ、深さ、光学フローのモダリティを登録することで、異なる復号段階のフィーチャーマップを自動的に調整する。また、アイデンティティ追跡を改善するために、ポストマッチング戦略が統合されている。 VOでは、VPSのパノプティカルセグメンテーション結果と光学フローマップを組み合わせることで、ダイナミックオブジェクトの信頼度推定を改善し、学習ベースパラダイムによるカメラポーズ推定と深度マップ生成の完全性を高める。さらに、SRの点ベースのレンダリングはVOから恩恵を受け、スパース点雲をニューラルネットワークに変換して高忠実なRGBビューと双対パノプティクスビューを合成する。 3つの公開データセットに対する大規模な実験により、我々の注意に基づく特徴融合は、計測、追跡、セグメンテーション、および新しいビュー合成タスクにおいて、既存の最先端の手法よりも優れていることが示された。

論文の概要: SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding

関連論文リスト