Fugu-MT 論文翻訳(概要): Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

論文の概要: Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

arxiv url: http://arxiv.org/abs/2605.06010v1
Date: Thu, 07 May 2026 11:03:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.712358
Title: Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models
Title（参考訳）: 蒸留拡散モデルによる視覚系への熱的認識のリアルタイム化
Authors: Yuchen Guo, Junli Gong, Wenjun Dong, Yiuming Cheung, Weifeng Su,
Abstract要約: 純粋なRGBベースの視覚モデルは、夜間や霧のような困難なシナリオにおいて、信頼できる手がかりを提供することができないことが多い。拡散レベルの品質を持つ完全独立なプラグアンドプレイコンポーネントとして設計されたリアルタイム画像融合モジュールであるFusionproxyを提案する。本手法は静的認識タスクにおいて優れた性能を実現し,閉ループ自律運転を含む動的タスクの堅牢性を大幅に向上させる。
参考スコア（独自算出の注目度）: 48.056469832242094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Purely RGB-based vision models often fail to provide reliable cues in challenging scenarios such as nighttime and fog, leading to degraded performance and safety risks. Infrared imaging captures heat-emitting sources and provides critical complementary information, but existing high-fidelity fusion methods suffer from prohibitive latency, rendering them impractical for real-time edge deployment. To address this, we propose FusionProxy, a real-time image fusion module designed as a fully independent, plug-and-play component with diffusion level quality. FusionProxy exploits two complementary statistics of a teacher sample ensemble: per-pixel variance in raw image space, used to weight pixel-level supervision, and per-pixel variance inside frozen foundation backbones, used to route feature-level alignment spatially. Once trained, FusionProxy can be directly integrated into any visual perception system without joint optimization. Extensive experiments demonstrate that our method achieves superior performance on static recognition tasks and significantly enhances robustness in dynamic tasks, including closed-loop autonomous driving. Crucially, FusionProxy achieves real-time inference speeds on diverse platforms, from high-end GPUs to commodity hardware, providing a flexible and generalizable solution for all-day perception.
Abstract（参考訳）: 純粋なRGBベースのビジョンモデルは、夜間や霧のような困難なシナリオにおいて、信頼性の高い手がかりを提供することができず、性能の低下と安全性のリスクをもたらす。赤外線イメージングは、熱放射源を捕捉し、重要な補完情報を提供するが、既存の高忠実度融合法は禁止的な遅延に悩まされ、リアルタイムのエッジ展開には実用的ではない。これを解決するためにFusionProxyを提案する。FusionProxyは、完全に独立したプラグイン・アンド・プレイコンポーネントとして設計され、拡散レベルの品質を持つリアルタイム画像融合モジュールである。 FusionProxyは教師のサンプルアンサンブルの相補的な統計を2つ利用している: 原画像空間におけるピクセルごとのばらつき(ピクセル単位の監督の重み付けに使用される)と、特徴レベルのアライメントを空間的にルーティングするために使用されるフリーズファンデーションバックボーン内のピクセルごとのばらつき(英語版)である。トレーニングが完了すると、FusionProxyは共同最適化なしで、どんな視覚認識システムにも直接統合できる。大規模実験により,本手法は静的認識タスクにおいて優れた性能を示し,閉ループ自律運転を含む動的タスクの堅牢性を大幅に向上することが示された。重要なのは、FusionProxyはハイエンドGPUからコモディティハードウェアに至るまで、さまざまなプラットフォーム上でリアルタイムの推論速度を実現し、オールデイ知覚のための柔軟性と一般化可能なソリューションを提供する。

論文の概要: Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

関連論文リスト