Fugu-MT 論文翻訳(概要): Dual-Foundation Models for Unsupervised Domain Adaptation

論文の概要: Dual-Foundation Models for Unsupervised Domain Adaptation

arxiv url: http://arxiv.org/abs/2605.03365v1
Date: Tue, 05 May 2026 04:52:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.773419
Title: Dual-Foundation Models for Unsupervised Domain Adaptation
Title（参考訳）: 教師なしドメイン適応のためのデュアルファウンデーションモデル
Authors: Yerin Cheon, Aruna Balasubramanian, Francois Rameau,
Abstract要約: セグメンテーションモデルのトレーニングには、現実世界のデータセットにコストがかかる、労働集約的なアノテーションが必要です。 Unsupervised Domain Adaptation (UDA)は、ラベル付き合成データ上でモデルをトレーニングし、ラベルなしの実画像に適用することによって、この問題に対処する。本稿では,2つの相補的基礎モデルを利用する二重境界 UDA フレームワークを提案する。
参考スコア（独自算出の注目度）: 2.279449016085348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained models, yielding biased and unstable anchors during adaptation. To address these issues, we propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning. Our method achieves consistent improvements of +1.3% and +1.4% mIoU over strong UDA baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes, respectively.
Abstract（参考訳）: セマンティックセグメンテーションは、自律運転やきめ細かい知覚タスクに不可欠なピクセルレベルのシーン理解を提供する。しかし、セグメンテーションモデルのトレーニングには、現実世界のデータセットに対する労働集約的なアノテーションが必要である。 Unsupervised Domain Adaptation (UDA)は、ラベル付き合成データ上でモデルをトレーニングし、ラベルなしの実画像に適用することによって、この問題に対処する。概念的には単純ではあるが、ドメインギャップ、すなわち合成データと実データの間の視覚的外観とシーン構造の違いにより適応は困難である。以前のアプローチでは、ピクセルレベルのミキシングや特徴レベルのコントラスト学習によって、このギャップを埋める。しかし,これらの手法は,(1)高信頼な擬似ラベルが学習を対象領域のサブセットに限定すること,(2)プロトタイプベースのコントラスト手法が,ソース学習モデルからクラスプロトタイプを初期化し,適応中にバイアスや不安定なアンカーを生じること,の2つの大きな制約に悩まされている。これらの問題に対処するために,2つの相補的基礎モデルを利用する二重境界 UDA フレームワークを提案する。まず,Segment Anything Model(SAM)とスーパーピクセル誘導プロンプトを併用し,高信頼度予測を超える幅広い対象画素からの学習を可能にする。第二に、DINOv3を組み込んで、その堅牢な表現学習を通じて、安定なドメイン不変クラスプロトタイプを構築します。本手法は, GTA-to-Cityscapes と SynTHIA-to-Cityscapes の強い UDA ベースラインに対する +1.3% と +1.4% mIoU の一貫した改善を実現する。

論文の概要: Dual-Foundation Models for Unsupervised Domain Adaptation

関連論文リスト