Related papers: EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision

EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision

URL: http://arxiv.org/abs/2412.02890v2
Date: Fri, 18 Apr 2025 22:01:56 GMT
Title: EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision
Authors: Dmitrii Torbunov, Yihui Ren, Animesh Ghose, Odera Dim, Yonggang Cui,
Abstract summary: Event-based cameras (EBCs) have emerged as a bio-inspired alternative to traditional cameras.<n>The development of image analysis methods for EBCs is challenging due to the sparse and asynchronous nature of the data.<n>We introduce I2EvDet, a novel adaptation framework that bridges mainstream object detection with temporal event data processing.
Score: 0.7270112855088837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event-based cameras (EBCs) have emerged as a bio-inspired alternative to traditional cameras, offering advantages in power efficiency, temporal resolution, and high dynamic range. However, the development of image analysis methods for EBCs is challenging due to the sparse and asynchronous nature of the data. This work addresses the problem of object detection for EBC cameras. The current approaches to EBC object detection focus on constructing complex data representations and rely on specialized architectures. We introduce I2EvDet (Image-to-Event Detection), a novel adaptation framework that bridges mainstream object detection with temporal event data processing. First, we demonstrate that a Real-Time DEtection TRansformer, or RT-DETR, a state-of-the-art natural image detector, trained on a simple image-like representation of the EBC data achieves performance comparable to specialized EBC methods. Next, as part of our framework, we develop an efficient adaptation technique that transforms image-based detectors into event-based detection models by modifying their frozen latent representation space through minimal architectural additions. The resulting EvRT-DETR model reaches state-of-the-art performance on the standard benchmark datasets Gen1 (mAP $+2.3$) and 1Mpx/Gen4 (mAP $+1.4$). These results demonstrate a fundamentally new approach to EBC object detection through principled adaptation of mainstream architectures, offering an efficient alternative with potential applications to other temporal visual domains. The code is available at: https://github.com/realtime-intelligence/evrt-detr

Related papers

Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search [0.8624680612413766]
Underwater object detection using sonar imagery has become a critical and rapidly evolving research domain within marine technology.<n>We specifically propose a Detection Transformer (DETR) architecture optimized with a Neural Architecture Search (NAS) approach.<n>This architecture achieves state-of-the-art performance on two Representative datasets.
arXiv Detail & Related papers (2025-05-10T16:41:09Z)
EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection [5.143097874851516]
Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras. Recent approaches in event-based object detection try to bridge this gap by employing computationally expensive transformer-based solutions. Our proposed EMF becomes the fastest Progression-based architecture in the domain by outperforming most efficient event-based object detectors.
arXiv Detail & Related papers (2025-04-05T09:48:40Z)
Graph-Enhanced EEG Foundation Model [16.335330142000657]
We propose a novel foundation model for EEG that integrates both temporal and inter-channel information.<n>Our architecture combines Graph Neural Networks (GNNs), which effectively capture relational structures, with a masked autoencoder to enable efficient pre-training.
arXiv Detail & Related papers (2024-11-29T06:57:50Z)
Evaluating Image-Based Face and Eye Tracking with Event Cameras [9.677797822200965]
Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed events'' This data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects. We evaluate the viability of integrating conventional algorithms with event-based data, transformed into a frame format.
arXiv Detail & Related papers (2024-08-19T20:27:08Z)
Geometric Features Enhanced Human-Object Interaction Detection [11.513009304308724]
We propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI) One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions.
arXiv Detail & Related papers (2024-06-26T18:52:53Z)
Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV. Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z)
ESTformer: Transformer Utilizing Spatiotemporal Dependencies for EEG Super-resolution [14.2426667945505]
ESTformer is an EEG framework utilizingtemporal dependencies based on the Transformer. The ESTformer applies positional encoding methods and the Multi-head Self-attention mechanism to the space and time dimensions.
arXiv Detail & Related papers (2023-12-03T12:26:32Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts [2.949710700293865]
3D object detection plays a pivotal role in autonomous driving and robotics, demanding precise interpretation of Bird's Eye View (BEV) images. We introduce a framework utilizing dynamic query evolution strategy, harnesses K-means and Top-K attention mechanisms. Our evaluation showcases a marked improvement in detection accuracy, setting a new benchmark in the domain of query-based BEV object detection.
arXiv Detail & Related papers (2023-10-07T21:55:29Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images. The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z)
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving. We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios. We re-emphasize the adaptation process across Sim2Real domains in this paper. We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z)
Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf) For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z)
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image [59.59698650663925]
Recently proposed generative models complete training based on only one image. We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances. Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.