Fugu-MT 論文翻訳(概要): MAVFusion: Efficient Infrared and Visible Video Fusion via Motion-Aware Sparse Interaction

論文の概要: MAVFusion: Efficient Infrared and Visible Video Fusion via Motion-Aware Sparse Interaction

arxiv url: http://arxiv.org/abs/2604.01958v1
Date: Thu, 02 Apr 2026 12:20:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.775899
Title: MAVFusion: Efficient Infrared and Visible Video Fusion via Motion-Aware Sparse Interaction
Title（参考訳）: MAVFusion: 運動認識スパース相互作用による高能率赤外・可視ビデオ融合
Authors: Xilai Li, Weijun Jiang, Xiaosong Li, Yang Liu, Hongbin Wang, Tao Ye, Huafeng Li, Haishu Tan,
Abstract要約: 赤外線および可視ビデオ融合は、赤外線画像からの物体の塩分度と、可視画像からのテクスチャの詳細とを組み合わせて、意味的に豊かな融合結果を生成する。現在の手法はフレーム間の相互作用を導入することで時間的整合性を改善するが、高い計算コストを必要とすることが多い。動作対応のスパースインタラクション機構を備えたエンドツーエンドビデオ融合フレームワークMAVFusionを提案する。
参考スコア（独自算出の注目度）: 22.27085934763657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Infrared and visible video fusion combines the object saliency from infrared images with the texture details from visible images to produce semantically rich fusion results. However, most existing methods are designed for static image fusion and cannot effectively handle frame-to-frame motion in videos. Current video fusion methods improve temporal consistency by introducing interactions across frames, but they often require high computational cost. To mitigate these challenges, we propose MAVFusion, an end-to-end video fusion framework featuring a motion-aware sparse interaction mechanism that enhances efficiency while maintaining superior fusion quality. Specifically, we leverage optical flow to identify dynamic regions in multi-modal sequences, adaptively allocating computationally intensive cross-modal attention to these sparse areas to capture salient transitions and facilitate inter-modal information exchange. For static background regions, a lightweight weak interaction module is employed to maintain structural and appearance integrity. By decoupling the processing of dynamic and static regions, MAVFusion simultaneously preserves temporal consistency and fine-grained details while significantly accelerating inference. Extensive experiments demonstrate that MAVFusion achieves state-of-the-art performance on multiple infrared and visible video benchmarks, achieving a speed of 14.16\,FPS at $640 \times 480$ resolution. The source code will be available at https://github.com/ixilai/MAVFusion.
Abstract（参考訳）: 赤外線および可視ビデオ融合は、赤外線画像からの物体の塩分度と、可視画像からのテクスチャの詳細とを組み合わせて、意味的に豊かな融合結果を生成する。しかし、既存のほとんどの手法は静止画像融合のために設計されており、ビデオのフレーム間移動を効果的に扱えない。現在のビデオ融合法は、フレーム間の相互作用を導入することで時間的整合性を改善するが、高い計算コストを必要とすることが多い。これらの課題を軽減するため,より優れた融合品質を維持しつつ効率を向上する動き認識スパース相互作用機構を備えたエンドツーエンドビデオ融合フレームワークMAVFusionを提案する。具体的には、光学フローを利用してマルチモーダルシーケンスの動的領域を同定し、これらのスパース領域に計算的に集中的に横断的な注意を割り当てることで、健全な遷移を捕捉し、モーダル間情報交換を容易にする。静的な背景領域では、構造的および外観的整合性を維持するために軽量な弱い相互作用モジュールが使用される。動的領域と静的領域の処理を分離することにより、MAVFusionは時間的一貫性と微細な詳細を同時に保存し、推論を著しく加速する。大規模な実験により、MAVFusionは複数の赤外線および可視ビデオベンチマークで最先端のパフォーマンスを達成し、14.16\,FPSの速度を640 \times 480$で達成した。ソースコードはhttps://github.com/ixilai/MAVFusion.comから入手できる。

関連論文リスト

FTPFusion: Frequency-Aware Infrared and Visible Video Fusion with Temporal Perturbation [5.5275479200431406]
FTP-Fusion(FTP-Fusion)は、時間的およびスパースな相互モーダル相互作用に基づく周波数対応赤外線および可視ビデオ融合法である。 FTP-Fusionは、空間的忠実度と時間的一貫性の両方において、複数のメトリクスにわたる最先端の手法を一貫して上回る。
論文参考訳（メタデータ） (2026-04-02T11:08:14Z)
MambaVF: State Space Model for Efficient Video Fusion [44.038619918204496]
MambaVFは状態空間モデル(SSM)に基づく効率的な融合フレームワークであり、明示的な動き推定なしで時間的モデリングを行う。 MambaVFは、複雑さとメモリコストを大幅に削減しながら、線形複雑で長い時間的依存関係をキャプチャする。我々はMambaVFが高効率であり、92.25%のパラメータと88.79%の計算FLOPと2.1倍の高速化を実現していることを強調した。
論文参考訳（メタデータ） (2026-02-05T18:53:47Z)
MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation [17.515348703686232]
我々は,並列なMambaエンコーダを用いてRGB画像とイベントストリームを効率的にモデル化する,新しいデュアルブランチセマンティックセマンティックセマンティクスフレームワークであるMambaSegを提案する。 MambaSegは、最先端のセグメンテーション性能を達成し、計算コストを大幅に削減し、効率的でスケーラブルで堅牢なマルチモーダル知覚に対する約束を示す。
論文参考訳（メタデータ） (2025-12-30T14:09:17Z)
VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion and Restoration [26.59510171451438]
既存のマルチセンサー融合研究は、主にビデオではなく複数の画像からの補完を統合している。 VideoFusionは、時間的相補性と時間的ダイナミクスを利用して、文脈的時間的コヒーレントなビデオを生成する。大規模な実験により、VideoFusionは、シーケンシャルなシナリオで既存の画像指向の融合パラダイムより優れていることが判明した。
論文参考訳（メタデータ） (2025-03-30T08:27:18Z)
Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
隣接するフレーム間の動き推定は、動きのあいまいさを避ける上で重要な役割を担っている。我々は、新しい拡散フレームワーク、動き認識潜在拡散モデル(MADiff)を提案する。提案手法は,既存手法を著しく上回る最先端性能を実現する。
論文参考訳（メタデータ） (2024-04-21T05:09:56Z)
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [95.24751989263117]
MaeFuseは、赤外線および可視画像融合(IVIF)用に設計された新しいオートエンコーダモデルである。提案モデルでは,MAE(Masked Autoencoders)の事前訓練エンコーダを用いて,低レベル再構成と高レベル視覚タスクのためのオムニ特徴抽出機能を備えている。 MaeFuseは、融合技術という領域で新しい視点を導入するだけでなく、様々な公開データセットで顕著なパフォーマンスで際立っている。
論文参考訳（メタデータ） (2024-04-17T02:47:39Z)
Motion-Aware Video Frame Interpolation [49.49668436390514]
我々は、連続するフレームから中間光の流れを直接推定する動き対応ビデオフレーム補間(MA-VFI)ネットワークを導入する。受容場が異なる入力フレームからグローバルな意味関係と空間的詳細を抽出するだけでなく、必要な計算コストと複雑さを効果的に削減する。
論文参考訳（メタデータ） (2024-02-05T11:00:14Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
本稿では,CDDFuse(Relationed-Driven Feature Decomposition Fusion)ネットワークを提案する。近赤外可視画像融合や医用画像融合など,複数の融合タスクにおいてCDDFuseが有望な結果をもたらすことを示す。
論文参考訳（メタデータ） (2022-11-26T02:40:28Z)
All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling [52.425236515695914]
最先端の手法は、一度に1つのフレームを補間する反復解である。この研究は、真のマルチフレーム補間子を導入している。時間領域のピラミッドスタイルのネットワークを使用して、複数フレームのタスクをワンショットで完了する。
論文参考訳（メタデータ） (2020-07-23T02:34:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。