Related papers: Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

URL: http://arxiv.org/abs/2505.02393v2
Date: Thu, 08 May 2025 09:44:41 GMT
Title: Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
Authors: Sungheon Jeong, Jihong Park, Mohsen Imani,
Abstract summary: Image-Event Fusion for Video Anomaly Detection (IEF-VAD) is a framework that synthesizes event representations directly from RGB videos.<n>IEF-VAD sets a new state of the art across multiple real-world anomaly detection benchmarks.
Score: 13.866203856820759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most existing video anomaly detectors rely solely on RGB frames, which lack the temporal resolution needed to capture abrupt or transient motion cues, key indicators of anomalous events. To address this limitation, we propose Image-Event Fusion for Video Anomaly Detection (IEF-VAD), a framework that synthesizes event representations directly from RGB videos and fuses them with image features through a principled, uncertainty-aware process. The system (i) models heavy-tailed sensor noise with a Student`s-t likelihood, deriving value-level inverse-variance weights via a Laplace approximation; (ii) applies Kalman-style frame-wise updates to balance modalities over time; and (iii) iteratively refines the fused latent state to erase residual cross-modal noise. Without any dedicated event sensor or frame-level labels, IEF-VAD sets a new state of the art across multiple real-world anomaly detection benchmarks. These findings highlight the utility of synthetic event representations in emphasizing motion cues that are often underrepresented in RGB frames, enabling accurate and robust video understanding across diverse applications without requiring dedicated event sensors. Code and models are available at https://github.com/EavnJeong/IEF-VAD.

Related papers

EventVAD: Training-Free Event-Aware Video Anomaly Detection [19.714436150837148]
EventVAD is an event-aware video anomaly detection framework.<n>It combines tailored dynamic graph architectures and multimodal-event reasoning.<n>It achieves state-of-the-art (SOTA) in training-free settings, outperforming strong baselines that use 7B or larger MLLMs.
arXiv Detail & Related papers (2025-04-17T16:59:04Z)
Inter-event Interval Microscopy for Event Cameras [52.05337480169517]
Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity.<n>We achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy.<n>We have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios.
arXiv Detail & Related papers (2025-04-07T11:05:13Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z)
FlexEvent: Event Camera Object Detection at Arbitrary Frequencies [45.82637829492951]
Event cameras offer unparalleled advantages for real-time perception in dynamic environments.<n>Existing event-based object detection methods are limited by fixed-frequency paradigms.<n>We propose FlexEvent, a novel event camera object detection framework that enables detection at arbitrary frequencies.
arXiv Detail & Related papers (2024-12-09T17:57:14Z)
Rethinking Video with a Universal Event-Based Representation [0.0]
I introduce Address, Decimation, DeltaER, a novel intermediate video representation and system framework. I demonstrate that ADDeltaER achieves state-of-the-art application speed and compression performance for scenes with high temporal redundancy. I discuss the implications for event-based video on large-scale video surveillance and resource-constrained sensing.
arXiv Detail & Related papers (2024-08-12T16:00:17Z)
Event-based Continuous Color Video Decompression from Single Frames [36.4263932473053]
We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream.<n>Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events.
arXiv Detail & Related papers (2023-11-30T18:59:23Z)
Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z)
Revisiting Event-based Video Frame Interpolation [49.27404719898305]
Dynamic vision sensors or event cameras provide rich complementary information for video frame. estimating optical flow from events is arguably more difficult than from RGB information. We propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages.
arXiv Detail & Related papers (2023-07-24T06:51:07Z)
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals. We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z)
An Asynchronous Intensity Representation for Framed and Event Video Sources [2.9097303137825046]
We introduce an intensity representation for both framed and non-framed data sources. We show that our representation can increase intensity precision and greatly reduce the number of samples per pixel. We argue that our method provides the computational efficiency and temporal granularity necessary to build real-time intensity-based applications for event cameras.
arXiv Detail & Related papers (2023-01-20T19:46:23Z)
TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches. We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.