Fugu-MT 論文翻訳(概要): Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

論文の概要: Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

arxiv url: http://arxiv.org/abs/2508.06546v1
Date: Tue, 05 Aug 2025 21:25:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.425094
Title: Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Title（参考訳）: マルチビュー画像からのロバスト3次元シーングラフ生成のための統計的信頼度補正
Authors: Qi Xun Yeo, Yanyan Li, Gim Hee Lee,
Abstract要約: セマンティックシーングラフ推定法は, 対象物, 述語, 関係性を正確に予測するために, 3Dアノテーションを利用する。我々は、予測深度マップから、ノイズの多い擬似点ベース形状を克服し、マルチビュー画像の特徴に現れる背景雑音の量を削減した。提案手法は,初期入力としてマルチビュー画像を純粋に用いた現在の手法より優れている。
参考スコア（独自算出の注目度）: 56.134885746889026
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern 3D semantic scene graph estimation methods utilize ground truth 3D annotations to accurately predict target objects, predicates, and relationships. In the absence of given 3D ground truth representations, we explore leveraging only multi-view RGB images to tackle this task. To attain robust features for accurate scene graph estimation, we must overcome the noisy reconstructed pseudo point-based geometry from predicted depth maps and reduce the amount of background noise present in multi-view image features. The key is to enrich node and edge features with accurate semantic and spatial information and through neighboring relations. We obtain semantic masks to guide feature aggregation to filter background features and design a novel method to incorporate neighboring node information to aid robustness of our scene graph estimates. Furthermore, we leverage on explicit statistical priors calculated from the training summary statistics to refine node and edge predictions based on their one-hop neighborhood. Our experiments show that our method outperforms current methods purely using multi-view images as the initial input. Our project page is available at https://qixun1.github.io/projects/SCRSSG.
Abstract（参考訳）: 現代の3Dセマンティックシーングラフ推定法では, 対象物, 述語, 関係性を正確に予測するために, 3Dアノテーションを用いる。与えられた3次元基底真理表現がない場合、この課題に対処するために、多視点RGB画像のみを活用することを検討する。正確なシーングラフ推定のためのロバストな特徴を得るためには、予測深度マップからノイズの多い偽点ベースの幾何を克服し、マルチビュー画像の特徴に現れる背景雑音の量を削減する必要がある。鍵となるのは、正確な意味情報と空間情報と隣り合う関係によって、ノードとエッジの機能を豊かにすることである。背景特徴をフィルタリングするために特徴集約をガイドするセマンティックマスクを取得し,シーングラフ推定の堅牢性を支援するために隣接するノード情報を組み込む新しい手法を設計する。さらに、トレーニング概要統計から算出した明示的な統計的先行情報を利用して、1ホップ近傍のノードとエッジの予測を洗練する。実験により,本手法は,初期入力としてマルチビュー画像を純粋に用いて,現在の手法よりも優れていることがわかった。プロジェクトページはhttps://qixun1.github.io/projects/SCRSSG.comで公開されている。

論文の概要: Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images

関連論文リスト