Fugu-MT 論文翻訳(概要): Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

論文の概要: Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

arxiv url: http://arxiv.org/abs/2510.04759v1
Date: Mon, 06 Oct 2025 12:36:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.854154
Title: Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Title（参考訳）: オープンボキャブラリ占有予測のための異方性を考慮したプログレッシブガウス変換器
Authors: Chi Yan, Dan Xu,
Abstract要約: オープンな3次元占有予測を可能にする革新的プログレッシブ・ガウス変換フレームワークPG-Occを提案する。本フレームワークでは,3次元ガウス表現を段階的に強化し,細かなシーンの詳細を捉えるフィードフォワード戦略であるプログレッシブオンラインデシフィケーションを採用している。 PG-Occは従来よりも14.3%mIoUの改善が得られた。
参考スコア（独自算出の注目度）: 9.952279648243058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method. Code and pretrained models will be released upon publication on our project page: https://yanchi-3dv.github.io/PG-Occ
Abstract（参考訳）: 3Dの占有率予測タスクは近年顕著な進歩をみせており、ビジョンベースの自動運転システムにおいて重要な役割を担っている。従来の手法は固定的なセマンティックなカテゴリに限られているが、最近のアプローチは、実世界のシーンでオープン語彙のテキストクエリを可能にするために、テキスト整列機能の予測に向けられている。しかし、テキスト・アライン・シーン・モデリングにはトレードオフがある: 疎いガウス表現はシーン内の小さなオブジェクトを捉えるのに苦労するが、密度の高い表現は計算上のオーバーヘッドを著しく引き起こす。これらの制約に対処するために,オープンな3次元占有予測を可能にする革新的プログレッシブ・ガウス変圧器フレームワークPG-Occを提案する。本フレームワークでは,3次元ガウス表現を段階的に強化し,細かなシーンの詳細を捉えるフィードフォワード戦略であるプログレッシブオンラインデシフィケーションを採用している。表現を反復的に強化することにより、フレームワークはより正確で詳細なシーン理解を実現する。もう一つの重要な貢献は、時空間融合による異方性を考慮したサンプリング戦略の導入である。 PG-Occは従来よりも14.3%mIoUの改善が得られた。コードと事前トレーニングされたモデルは、プロジェクトのページで公開される。

論文の概要: Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

関連論文リスト