Fugu-MT 論文翻訳(概要): Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

論文の概要: Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

arxiv url: http://arxiv.org/abs/2511.15311v1
Date: Wed, 19 Nov 2025 10:22:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.753807
Title: Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models
Title（参考訳）: Adapt-As-You-Walk through the clouds: Training-free Online Test-Time Adaptation of 3D Vision-Language Foundation Models
Authors: Mehran Tamjidi, Hamidreza Dastmalchi, Mohammadreza Alimoradijazi, Ali Cheraghian, Aijun An, Morteza Saberi,
Abstract要約: 3D Vision-Language Foundation Models (VLFMs) は、オープンワールドのポイントクラウド処理タスクにおいて、強力な一般化とゼロショット認識能力を示している。動的プロトタイプ学習に基づく3次元VLFMのための新しいトレーニング不要オンラインテスト時間適応戦略であるUni-Adapterを提案する。
参考スコア（独自算出の注目度）: 4.9608847222581005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D Vision-Language Foundation Models (VLFMs) have shown strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, these models often underperform in practical scenarios where data are noisy, incomplete, or drawn from a different distribution than the training data. To address this, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. We define a 3D cache to store class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability in heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation via similarity scoring. Simultaneously, a graph-based label smoothing module captures inter-prototype similarities to enforce label consistency among similar prototypes. Finally, we unify predictions from the original 3D VLFM and the refined 3D cache using entropy-weighted aggregation for reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts, achieving state-of-the-art performance on diverse 3D benchmarks over different 3D VLFMs, improving ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.
Abstract（参考訳）: 3D Vision-Language Foundation Models (VLFMs) は、オープンワールドのポイントクラウド処理タスクにおいて、強力な一般化とゼロショット認識能力を示している。しかし、これらのモデルは、データがノイズ、不完全、あるいはトレーニングデータとは異なる分布から引き出されるような実践的なシナリオでは、しばしば性能が劣る。そこで我々は,動的プロトタイプ学習に基づく3次元VLFMのための新しいトレーニングフリーオンラインテスト時間適応(TTA)戦略であるUni-Adapterを提案する。我々は,クラス固有のクラスタセンターをプロトタイプとして格納する3Dキャッシュを定義した。これらの動的なプロトタイプは、類似性スコアリングによるキャッシュベースのロジット計算のアンカーとして機能する。グラフベースのラベル平滑化モジュールは、類似したプロトタイプ間でラベルの一貫性を強制するために、プロトタイプ間の類似性をキャプチャする。最後に、エントロピー重み付けアグリゲーションを用いて、元の3D VLFMと改良された3Dキャッシュからの予測を統合する。再トレーニングなしでは、Uni-Adapterは分散シフトを効果的に軽減し、様々な3D VLFM上で様々な3Dベンチマークで最先端のパフォーマンスを実現し、ModelNet-40Cを10.55%改善し、ScanObjectNN-Cを8.26%改善し、ShapeNet-Cを4.49%改善した。

論文の概要: Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

関連論文リスト