Fugu-MT 論文翻訳(概要): V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

論文の概要: V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

arxiv url: http://arxiv.org/abs/2603.13089v1
Date: Fri, 13 Mar 2026 15:39:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.162049
Title: V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration
Title（参考訳）: V-Bridge(動画)
Authors: Shenghe Zheng, Junpeng Jiang, Wenbo Li,
Abstract要約: V-Bridgeは、多目的な数ショット画像復元タスクに遅延容量をブリッジするフレームワークである。既存の修復方法の2%未満の1,000個のマルチタスクトレーニングサンプルで、事前訓練されたビデオモデルは、競争力のある画像復元を行うために誘導される。その結果,映像生成モデルは,極めて限られたデータでのみ活性化可能な,強力で伝達可能な復元前処理を暗黙的に学習できることが判明した。
参考スコア（独自算出の注目度）: 8.147701740798297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.
Abstract（参考訳）: 大規模ビデオ生成モデルは、広範かつ多様な視覚データに基づいて訓練され、視覚世界のリッチな構造、セマンティック、動的事前を内部化することができる。これらのモデルは印象的な生成能力を示してきたが、汎用的な視覚学習者としての可能性はほとんど失われていない。本研究では,この遅延容量を多目的な数ショット画像復元タスクにブリッジするフレームワークであるV-Bridgeを紹介する。我々は、画像復元を静的回帰問題ではなく、プログレッシブな生成過程として再解釈し、ビデオモデルを利用して劣化した入力から高忠実度出力への段階的改善をシミュレートする。驚くべきことに、既存の修復方法の2%未満の1,000のマルチタスクトレーニングサンプルで、事前訓練されたビデオモデルは、競争力のある画像復元を行うために誘導され、単一のモデルで複数のタスクを達成し、この目的のために明示的に設計された特殊なアーキテクチャと競合する。その結果,映像生成モデルは,非常に限られたデータでのみ活性化可能な,強力で伝達可能な再生前の先行情報を暗黙的に学習し,生成モデルと低レベルの視界の境界に挑戦し,視覚タスクにおける基礎モデルのための新しい設計パラダイムを創り出すことができた。

論文の概要: V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

関連論文リスト