Fugu-MT 論文翻訳(概要): SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

論文の概要: SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

arxiv url: http://arxiv.org/abs/2605.02258v1
Date: Mon, 04 May 2026 06:09:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.157003
Title: SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters
Title（参考訳）: SpectraDINO:軽量アダプタによるビジョンファウンデーションモデルにおけるスペクトルギャップのブリッジ
Authors: Yagiz Nalcakan, Hyeongjin Ju, Incheol Park, Sanghyeop Yeo, Youngwan Jin, Shiho Kim,
Abstract要約: 大規模RGBデータに事前訓練されたビジョンファウンデーションモデル(VFM)は、顕著な表現品質を示している。近赤外(NIR)、短波赤外(SWIR)、長波赤外(LWIR)にまたがるマルチスペクトルイメージングへの適用性は、いまだに未調査である。我々は、DINOv2 ViTバックボーンを超可視モードに拡張することで、このスペクトルギャップを橋渡しするマルチスペクトルVFMであるSpectraDINOを提案する。
参考スコア（独自算出の注目度）: 1.2622634782102324
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vision Foundation Models (VFMs) pretrained on large-scale RGB data have demonstrated remarkable representation quality, yet their applicability to multispectral imaging spanning Near-Infrared (NIR), Short-Wave Infrared (SWIR), and Long-Wave Infrared (LWIR) remains largely unexplored. These spectral modalities offer complementary sensing capabilities critical for robust perception in adverse conditions, but present a fundamental domain gap relative to RGB-centric pretrained models. We present SpectraDINO, a multispectral VFM that bridges this spectral gap by extending DINOv2 ViT backbones to beyond-visible modalities through lightweight, per-modality bottleneck adapters, while preserving the rich representations of the frozen RGB backbone. We introduce a multi-stage teacher-student training protocol in which a frozen DINOv2 teacher guides a spectral student via cosine distillation, symmetric contrastive loss, patch-level alignment, and a novel neighborhood-structure-preservation loss. This staged curriculum enables strong cross-modal alignment without catastrophic forgetting of RGB priors. We evaluate SpectraDINO on multispectral object detection and semantic segmentation across challenging NIR, SWIR, and LWIR benchmarks using widely adopted fusion strategies. SpectraDINO achieves state-of-the-art performance across most benchmarks, validating its effectiveness as a general-purpose backbone for spectral generalization. The code and weights for model variants are available at https://github.com/Yonsei-STL/SpectraDINO.
Abstract（参考訳）: 大規模RGBデータで事前訓練された視覚基礎モデル(VFM)は、顕著な表現品質を示しているが、近赤外(NIR)、短波赤外(SWIR)、長波赤外(LWIR)にまたがるマルチスペクトル画像に適用可能であることは、ほとんど未発見のままである。これらのスペクトルモダリティは、有害な条件における堅牢な認識に不可欠な相補的知覚能力を提供するが、RGB中心の事前訓練モデルに対する基本的な領域ギャップを示す。凍結したRGBバックボーンのリッチな表現を保ちながら、DINOv2 ViTバックボーンを軽量かつモダリティごとのボトルネックアダプタを通じて、可視なモダリティに拡張することで、このスペクトルギャップを橋渡しするマルチスペクトルVFMであるSpectraDINOを提案する。凍結したDINOv2教師が、コサイン蒸留、対称的コントラスト損失、パッチレベルのアライメント、新しい近傍構造保存損失を通じてスペクトル学生を指導する多段階教師学生訓練プロトコルを導入する。この段階的なカリキュラムは、RGB以前の破滅的な忘れをせずに、強力な相互モーダルアライメントを可能にする。 NIR,SWIR,LWIRベンチマークにおけるマルチスペクトルオブジェクトの検出とセマンティックセグメンテーションについて広く採用されている融合戦略を用いてSpectraDINOの評価を行った。 SpectraDINOは、ほとんどのベンチマークで最先端のパフォーマンスを実現し、スペクトル一般化の汎用バックボーンとしての有効性を検証する。モデル変種に関するコードと重み付けはhttps://github.com/Yonsei-STL/SpectraDINOで公開されている。

論文の概要: SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

関連論文リスト