Fugu-MT 論文翻訳(概要): Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

論文の概要: Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

arxiv url: http://arxiv.org/abs/2605.04506v2
Date: Wed, 13 May 2026 00:18:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 17:13:58.760435
Title: Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting
Title（参考訳）: Ilov3Splat: ガウスのスプレイティングにおける3Dシーン理解
Authors: Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam,
Abstract要約: Ilov3Splatは3Dガウススプラッティング(3D-GS)上に構築されたインスタンスレベルのオープンな3Dシーン理解のためのフレームワーク言語対応のCLIP機能を効率的にエンコードするために,マルチレゾリューションハッシュ埋め込みを活用している。推論時に、CLIPエンコードされたクエリは学習した機能と一致し、続いて関連するガウスグループを取得するための2段階の3Dクラスタリングが続く。
参考スコア（独自算出の注目度）: 32.46353940664006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched against the learned features, followed by two-stage 3D clustering to retrieve relevant Gaussian groups. This enables our framework to identify arbitrary objects in 3D scenes based on natural language descriptions, without requiring category supervision or manual annotations. Experiments on standard benchmarks demonstrate that Ilov3Splat outperforms prior open-vocabulary 3D-GS methods in both object selection and instance segmentation, offering a flexible and accurate solution for language-driven 3D scene understanding. Project page: https://csiro-robotics.github.io/Ilov3Splat.
Abstract（参考訳）: Ilov3Splatは,3Dガウススプラッティング(3D-GS)上に構築された3Dシーン理解のための,インスタンスレベルのオープンな3Dシーン理解のための新しいフレームワークである。これまでの作業のほとんどは、2Dレンダリングベースのマッチングやポイントレベルのセマンティックアソシエーションに依存しており、これはクロスビューの一貫性を損なうとともに、一貫性のあるインスタンスレベルの推論が欠如し、下流3Dタスクの精度を制限している。これらの制約に対処するために、ビュー一貫性のある特徴場を持つガウススプラットを拡大することにより、シーンの幾何学と意味表現を協調的に最適化する。具体的には,多分解能ハッシュ埋め込みを利用して言語対応CLIP機能を効率的にエンコードし,高密度で一貫性のある3D空間での言語接地を可能にする。さらに、SAMマスクのコントラスト損失を利用してインスタンス機能フィールドをトレーニングし、ビュー間のきめ細かいオブジェクトの区別をサポートする。推論時に、CLIPエンコードされたクエリは学習した機能と一致し、続いて関連するガウスグループを取得するための2段階の3Dクラスタリングが続く。これにより,カテゴリ管理や手動アノテーションを必要とせずに,自然言語記述に基づく3次元シーン内の任意のオブジェクトを識別することが可能となる。標準ベンチマークの実験では、Ilov3Splatは、オブジェクトの選択とインスタンスセグメンテーションの両方において、以前のオープンな3D-GSメソッドよりも優れており、言語駆動の3Dシーン理解のための柔軟で正確なソリューションを提供する。プロジェクトページ: https://csiro-robotics.github.io/Ilov3Splat.com

論文の概要: Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

関連論文リスト