Fugu-MT 論文翻訳(概要): FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning

論文の概要: FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning

arxiv url: http://arxiv.org/abs/2508.20817v1
Date: Thu, 28 Aug 2025 14:15:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.442374
Title: FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning
Title（参考訳）: FusionCounting:マルチタスク学習による群衆カウントによる可視赤外画像融合のロバスト化
Authors: He Li, Xinyu Liu, Weihang Kong, Xingchen Zhang,
Abstract要約: 多くの可視・赤外線画像融合(VIF)法は、主に融合画像の品質を最適化することに焦点を当てている。近年, セマンティックセグメンテーションやオブジェクト検出などの下流タスクを組み込んで, VIFのセマンティックガイダンスを提供する研究が進められている。群衆カウントをVIFプロセスに統合する新しいマルチタスク学習フレームワークFusionCountingを提案する。
参考スコア（独自算出の注目度）: 16.955260249719533
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Most visible and infrared image fusion (VIF) methods focus primarily on optimizing fused image quality. Recent studies have begun incorporating downstream tasks, such as semantic segmentation and object detection, to provide semantic guidance for VIF. However, semantic segmentation requires extensive annotations, while object detection, despite reducing annotation efforts compared with segmentation, faces challenges in highly crowded scenes due to overlapping bounding boxes and occlusion. Moreover, although RGB-T crowd counting has gained increasing attention in recent years, no studies have integrated VIF and crowd counting into a unified framework. To address these challenges, we propose FusionCounting, a novel multi-task learning framework that integrates crowd counting into the VIF process. Crowd counting provides a direct quantitative measure of population density with minimal annotation, making it particularly suitable for dense scenes. Our framework leverages both input images and population density information in a mutually beneficial multi-task design. To accelerate convergence and balance tasks contributions, we introduce a dynamic loss function weighting strategy. Furthermore, we incorporate adversarial training to enhance the robustness of both VIF and crowd counting, improving the model's stability and resilience to adversarial attacks. Experimental results on public datasets demonstrate that FusionCounting not only enhances image fusion quality but also achieves superior crowd counting performance.
Abstract（参考訳）: 多くの可視・赤外線画像融合(VIF)法は、主に融合画像の品質を最適化することに焦点を当てている。近年, セマンティックセグメンテーションやオブジェクト検出などの下流タスクを組み込んで, VIFのセマンティックガイダンスを提供する研究が進められている。しかし、セマンティックセグメンテーションは広範なアノテーションを必要とするが、オブジェクト検出はセグメンテーションと比較してアノテーションの労力を減らしているが、バウンディングボックスとオクルージョンが重複しているため、非常に混み合ったシーンで課題に直面している。さらに, 近年, RGB-T 群集カウントが注目されているが, VIF や群集カウントを統一フレームワークに統合する研究は行われていない。これらの課題に対処するために,VIFプロセスに群衆カウントを統合する新しいマルチタスク学習フレームワークFusionCountingを提案する。群衆カウントは、最小限のアノテーションで人口密度を直接測定し、特に密集したシーンに適している。本フレームワークは, 相互に有用なマルチタスク設計において, 入力画像と人口密度情報の両方を活用する。収束とタスクのバランスをとるために,動的損失関数重み付け戦略を導入する。さらに,対戦訓練を取り入れて,VIFとクラウドカウントの堅牢性を向上し,対戦攻撃に対するモデルの安定性とレジリエンスを向上させる。公開データセットによる実験結果から,FusionCountingは画像融合品質を高めるだけでなく,群衆カウント性能も向上することが示された。

論文の概要: FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning

関連論文リスト