Fugu-MT 論文翻訳(概要): Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning

論文の概要: Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning

arxiv url: http://arxiv.org/abs/2603.22070v2
Date: Wed, 25 Mar 2026 16:07:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 14:25:25.980904
Title: Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning
Title（参考訳）: マルチモーダルベイズ分布学習による点雲解析の適応
Authors: Xingyu Zhu, Liang Yi, Shuo Wang, Wenbo Zhu, Yonglinag Wu, Beier Zhu, Hanwang Zhang,
Abstract要約: マルチモーダルな3次元視覚言語モデルは多種多様な3次元タスクにまたがる強力な一般化を示すが、その性能はドメインシフトで顕著に低下する。これはテストタイム適応に関する最近の研究を動機付けており、テストタイムデータを使ってモデルをオンラインに適応させることができる。テストポイントクラウド分析のためのマルチモーダル分散学習フレームワークBayesMMを提案する。
参考スコア（独自算出の注目度）: 47.975618905252354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal 3D vision-language models show strong generalization across diverse 3D tasks, but their performance still degrades notably under domain shifts. This has motivated recent studies on test-time adaptation (TTA), which enables models to adapt online using test-time data. Among existing TTA methods, cache-based mechanisms are widely adopted for leveraging previously observed samples in online prediction refinement. However, they store only limited historical information, leading to progressive information loss as the test stream evolves. In addition, their prediction logits are fused heuristically, making adaptation unstable. To address these limitations, we propose BayesMM, a Multimodal Bayesian Distribution Learning framework for test-time point cloud analysis. BayesMM models textual priors and streaming visual features of each class as Gaussian distributions: textual parameters are derived from semantic prompts, while visual parameters are updated online with arriving samples. The two modalities are fused via Bayesian model averaging, which automatically adjusts their contributions based on posterior evidence, yielding a unified prediction that adapts continually to evolving test-time data without training. Extensive experiments on multiple point cloud benchmarks demonstrate that BayesMM maintains robustness under distributional shifts, yielding over 4% average improvement.
Abstract（参考訳）: マルチモーダルな3次元視覚言語モデルは多種多様な3次元タスクにまたがる強力な一般化を示すが、その性能はドメインシフトで顕著に低下する。これはテスト時間適応(TTA)に関する最近の研究を動機付けており、テスト時間データを用いてモデルがオンラインに適応できるようにする。既存のTTA手法のうち、キャッシュベースのメカニズムは、オンライン予測改善において、以前に観測されたサンプルを活用するために広く採用されている。しかし、それらは限られた履歴情報しか保存せず、テストストリームが進化するにつれて、進歩的な情報損失につながる。さらに、それらの予測ロジットはヒューリスティックに融合し、適応が不安定になる。これらの制約に対処するため,テスト時間点クラウド分析のためのマルチモーダルベイズ分布学習フレームワークBayesMMを提案する。 BayesMMは各クラスのテキスト先行とストリーミング視覚特徴をガウス分布としてモデル化する: テキストパラメータはセマンティックプロンプトから派生し、ビジュアルパラメータは到着したサンプルとともにオンラインで更新される。この2つのモダリティはベイズモデル平均化(英語版)によって融合され、後続の証拠に基づいてそれらの貢献を自動的に調整し、トレーニング無しで進化するテストタイムデータに継続的に適応する統一的な予測をもたらす。マルチポイントクラウドベンチマークの大規模な実験は、ベイズMMが分散シフトの下で堅牢性を維持し、平均4%以上の改善をもたらすことを示した。

論文の概要: Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning

関連論文リスト