Fugu-MT 論文翻訳(概要): X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

論文の概要: X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

arxiv url: http://arxiv.org/abs/2603.09632v1
Date: Tue, 10 Mar 2026 13:10:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.326096
Title: X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models
Title（参考訳）: X-GS:3DGSアーキテクチャと下流マルチモーダルモデルを統合する拡張可能なオープンフレームワーク
Authors: Yueen Ma, Irwin King,
Abstract要約: 我々は,リアルタイム3DGSベースのオンラインSLAMを実現するために,幅広い技術を統合するオープンフレームワークであるX-GSを紹介する。 X-GSのコアには、X-GS-Perceiverと呼ばれる高効率なパイプラインがあり、幾何学とポーズを共最適化するための入力として、未提案のRGBビデオストリームを取り込むことができる。我々は、新しいオンラインベクトル量子化(VQ)モジュール、GPU加速グリッドサンプリング方式、高並列化パイプライン設計によるリアルタイムパフォーマンスを実現する。
参考スコア（独自算出の注目度）: 50.01070135500655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods are isolated, focusing on specific domains such as online SLAM, semantic enrichment, or 3DGS for unposed images. In this paper, we introduce X-GS, an extensible open framework that unifies a broad range of techniques to enable real-time 3DGS-based online SLAM enriched with semantics, bridging the gap to downstream multimodal models. At the core of X-GS is a highly efficient pipeline called X-GS-Perceiver, capable of taking unposed RGB (or optionally RGB-D) video streams as input to co-optimize geometry and poses, and distill high-dimensional semantic features from vision foundation models into the 3D Gaussians. We achieve real-time performance through a novel online Vector Quantization (VQ) module, a GPU-accelerated grid-sampling scheme, and a highly parallelized pipeline design. The semantic 3D Gaussians can then be utilized by vision-language models within the X-GS-Thinker component, enabling downstream tasks such as object detection, zero-shot caption generation, and potentially embodied tasks. Experimental results on real-world datasets showcase the efficacy, efficiency, and newly unlocked multimodal capabilities of the X-GS framework.
Abstract（参考訳）: 3D Gaussian Splatting (3DGS)は、新しいビュー合成の強力な技術として登場し、その後、多くの空間AIアプリケーションに拡張されている。しかし,既存の3DGS手法の多くは分離されており,オンラインSLAMやセマンティックエンリッチメント,非ポーズ画像の3DGSといった特定の領域に焦点を当てている。本稿では,リアルタイム3DGSベースのオンラインSLAMにセマンティクスを付加し,下流マルチモーダルモデルとのギャップを埋めることのできる拡張可能なオープンフレームワークであるX-GSを紹介する。 X-GSのコアには、X-GS-Perceiverと呼ばれる高効率のパイプラインがあり、未提案のRGB(またはオプションでRGB-D)ビデオストリームを、共同最適化された幾何学とポーズの入力として取り込んで、視覚基礎モデルから高次元のセマンティック特徴を3Dガウスに蒸留することができる。我々は、新しいオンラインベクトル量子化(VQ)モジュール、GPU加速グリッドサンプリング方式、高並列化パイプライン設計によるリアルタイムパフォーマンスを実現する。セマンティックな3Dガウスアンは、X-GS-Thinkerコンポーネント内の視覚言語モデルによって利用でき、オブジェクト検出、ゼロショットキャプション生成、潜在的に具体化されたタスクなどの下流タスクを可能にする。実世界のデータセットに対する実験結果は、X-GSフレームワークの有効性、効率性、新しくアンロックされたマルチモーダル能力を示す。

論文の概要: X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

関連論文リスト