Fugu-MT 論文翻訳(概要): Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models

論文の概要: Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models

arxiv url: http://arxiv.org/abs/2508.14707v1
Date: Wed, 20 Aug 2025 13:30:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-21 16:52:41.468919
Title: Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
Title（参考訳）: ジャイアンツ・オブ・ジャイアンツ : ビジョンファウンデーションモデルのための知識継承
Authors: Jiabo Huang, Chen Chen, Lingjuan Lyu,
Abstract要約: ビジョンファウンデーションモデル(VFM)は、主にデータ中心の手法を用いて開発されている。多くのオープンソースビジョンモデルは、ドメイン固有のデータに基づいて事前訓練されている。本稿では,共同知識の伝達と保存を通じてVFMを訓練するためのモデル駆動型アプローチを提案する。
参考スコア（独自算出の注目度）: 43.517843843279266
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Vision foundation models (VFMs) are predominantly developed using data-centric methods. These methods require training on vast amounts of data usually with high-quality labels, which poses a bottleneck for most institutions that lack both large-scale data and high-end GPUs. On the other hand, many open-source vision models have been pretrained on domain-specific data, enabling them to distill and represent core knowledge in a form that is transferable across diverse applications. Even though these models are highly valuable assets, they remain largely under-explored in empowering the development of a general-purpose VFM. In this paper, we presents a new model-driven approach for training VFMs through joint knowledge transfer and preservation. Our method unifies multiple pre-trained teacher models in a shared latent space to mitigate the ``imbalanced transfer'' issue caused by their distributional gaps. Besides, we introduce a knowledge preservation strategy to take a general-purpose teacher as a knowledge base for integrating knowledge from the remaining purpose-specific teachers using an adapter module. By unifying and aggregating existing models, we build a powerful VFM to inherit teachers' expertise without needing to train on a large amount of labeled data. Our model not only provides generalizable visual features, but also inherently supports multiple downstream tasks. Extensive experiments demonstrate that our VFM outperforms existing data-centric models across four fundamental vision tasks, including image classification, object detection, semantic and instance segmentation.
Abstract（参考訳）: ビジョンファウンデーションモデル(VFM)は、主にデータ中心の手法を用いて開発されている。これらの方法は、通常高品質なラベルで大量のデータをトレーニングする必要があるため、大規模データとハイエンドGPUの両方が欠如しているほとんどの機関ではボトルネックとなる。一方、多くのオープンソースビジョンモデルは、ドメイン固有のデータに基づいて事前訓練されており、様々なアプリケーション間で転送可能な形でコア知識を蒸留し、表現することができる。これらのモデルは非常に価値の高い資産であるにもかかわらず、汎用的なVFMの開発にはあまり役に立たないままである。本稿では,共同知識の伝達と保存を通じてVFMを訓練するための新しいモデル駆動型アプローチを提案する。提案手法は,複数の学習済み教師モデルを共有潜在空間に統一し,その分散的ギャップに起因する「不均衡転送」問題を緩和する。さらに, 一般教師を, アダプタモジュールを用いて, 残りの目的固有の教師からの知識を統合するための知識基盤として活用する知識保存戦略を導入する。既存のモデルを統一し集約することにより、大量のラベル付きデータをトレーニングすることなく、教師の専門知識を継承する強力なVFMを構築します。我々のモデルは、一般化可能な視覚的特徴を提供するだけでなく、本質的に複数の下流タスクをサポートする。我々のVFMは、画像分類、オブジェクト検出、セマンティクス、インスタンスセグメンテーションを含む4つの基本的なビジョンタスクにおいて、既存のデータ中心モデルよりも優れていることを示した。

論文の概要: Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models

関連論文リスト