Fugu-MT 論文翻訳(概要): X-GS: An Extensible Open Framework for Perceiving and Thinking via 3D Gaussian Splatting

論文の概要: X-GS: An Extensible Open Framework for Perceiving and Thinking via 3D Gaussian Splatting

arxiv url: http://arxiv.org/abs/2603.09632v2
Date: Thu, 12 Mar 2026 07:14:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.455294
Title: X-GS: An Extensible Open Framework for Perceiving and Thinking via 3D Gaussian Splatting
Title（参考訳）: X-GS:3Dガウススプレイティングによる知覚と思考のための拡張可能なオープンフレームワーク
Authors: Yueen Ma, Zenglin Xu, Irwin King,
Abstract要約: 我々は、X-GS-PerceiverとX-GS-Thinkerの2つの主要コンポーネントからなるオープンフレームワークであるX-GSを紹介する。 Perceiverは、リアルタイムオンラインSLAMを可能にするために、幅広い3DGS技術を統合する。 Thinkerは視覚サンプリングモデルに対応し、結果の3Dセマンティック・ガウシアンを使用し、オブジェクト検出、キャプション生成、潜在的に具体化されたタスクなどの下流アプリケーションを可能にする。
参考スコア（独自算出の注目度）: 72.02343855552051
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods operate in isolation, focusing on specific domains such as pose-free 3DGS, online SLAM, and semantic enrichment. In this paper, we introduce X-GS, an extensible open framework consisting of two major components: the X-GS-Perceiver, which unifies a broad range of 3DGS techniques to enable real-time online SLAM and distill semantic features; and the X-GS-Thinker, which interfaces with downstream multimodal models. In our implementation of the Perceiver, we integrate various 3DGS methods through three novel mechanisms: an online Vector Quantization (VQ) module, a GPU-accelerated grid-sampling scheme, and a highly parallelized pipeline design. The Thinker accommodates vision-language models and utilizes the resulting 3D semantic Gaussians, enabling downstream applications such as object detection, caption generation, and potentially embodied tasks. Experimental results on real-world datasets demonstrate the efficiency and newly unlocked multimodal capabilities of the X-GS framework.
Abstract（参考訳）: 3D Gaussian Splatting (3DGS)は、新しいビュー合成の強力な技術として登場し、その後、多くの空間AIアプリケーションに拡張されている。しかし、既存の3DGSメソッドの多くは独立して動作しており、ポーズフリーの3DGS、オンラインSLAM、セマンティックエンリッチメントといった特定のドメインに焦点を当てている。本稿では,2つの主要なコンポーネントからなる拡張可能なオープンフレームワークであるX-GSを紹介する。X-GS-Perceiverは,リアルタイムオンラインSLAMと蒸留セマンティック機能を実現するために,広範囲な3DGS技術を統一し,下流マルチモーダルモデルとインターフェースするX-GS-Thinkerである。 Perceiverの実装において、オンラインベクトル量子化(VQ)モジュール、GPU加速グリッドサンプリングスキーム、高並列化パイプライン設計という3つの新しいメカニズムを通じて、様々な3DGS手法を統合する。 Thinkerは視覚言語モデルに対応し、結果の3Dセマンティック・ガウシアンを使用し、オブジェクト検出、キャプション生成、潜在的に具体化されたタスクなどの下流アプリケーションを可能にする。実世界のデータセットに対する実験結果は,X-GSフレームワークの効率性と,新たにアンロックされたマルチモーダル能力を示す。

論文の概要: X-GS: An Extensible Open Framework for Perceiving and Thinking via 3D Gaussian Splatting

関連論文リスト