Related papers: PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

URL: http://arxiv.org/abs/2602.04583v1
Date: Wed, 04 Feb 2026 14:10:36 GMT
Title: PEPR: Privileged Event-based Predictive Regularization for Domain Generalization
Authors: Gabriele Magrini, Federico Becattini, Niccolò Biondi, Pietro Pala,
Abstract summary: We propose a cross-modal framework under the learning using privileged information (LUPI) paradigm for training a robust, single-modality RGB model.<n>We leverage event cameras as a source of privileged information, available only during training.<n>We train the RGB encoder with PEPR to predict event-based latent features, distilling robustness without sacrificing semantic richness.
Score: 19.185122873391517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks for visual perception are highly susceptible to domain shift, which poses a critical challenge for real-world deployment under conditions that differ from the training data. To address this domain generalization challenge, we propose a cross-modal framework under the learning using privileged information (LUPI) paradigm for training a robust, single-modality RGB model. We leverage event cameras as a source of privileged information, available only during training. The two modalities exhibit complementary characteristics: the RGB stream is semantically dense but domain-dependent, whereas the event stream is sparse yet more domain-invariant. Direct feature alignment between them is therefore suboptimal, as it forces the RGB encoder to mimic the sparse event representation, thereby losing semantic detail. To overcome this, we introduce Privileged Event-based Predictive Regularization (PEPR), which reframes LUPI as a predictive problem in a shared latent space. Instead of enforcing direct cross-modal alignment, we train the RGB encoder with PEPR to predict event-based latent features, distilling robustness without sacrificing semantic richness. The resulting standalone RGB model consistently improves robustness to day-to-night and other domain shifts, outperforming alignment-based baselines across object detection and semantic segmentation.

Related papers

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking [51.31378940976401]
Existing RGB-Event tracking approaches fail to fully exploit the unique advantages of event cameras.<n>We propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality.<n>Experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method.
arXiv Detail & Related papers (2026-01-03T01:10:17Z)
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation [18.450662919776757]
We propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues.<n>Our method outperforms the state-of-the-art by a 2.55% mIoU on our proposed DERS-XS.
arXiv Detail & Related papers (2025-11-11T14:00:27Z)
Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source.<n>We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z)
Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.<n>It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.<n>We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z)
VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition [54.27379947727035]
This paper proposes a novel PEFT strategy to adapt the pre-trained foundation vision models for the RGB-Event-based classification.<n>The frame difference of the dual modalities is also considered to capture the motion cues via the frame difference backbone network.<n>The source code and pre-trained models will be released on urlhttps://github.com/Event-AHU/VELoRA.
arXiv Detail & Related papers (2024-12-28T07:38:23Z)
Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation [19.384129689848294]
This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. We present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment.
arXiv Detail & Related papers (2024-03-18T06:42:38Z)
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z)
Memory Regulation and Alignment toward Generalizer RGB-Infrared Person [24.2142124801929]
RGB-IR ReID always demands discriminative features, leading to over-rely feature sensitivity of seen classes.<n>We propose a novel multi-granularity memory regulation and alignment module (MG-MRA) to solve this issue.<n>Our method could alleviate the over-confidence of the model about discriminative features of seen classes.
arXiv Detail & Related papers (2021-09-18T05:55:06Z)
Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking [32.760681454334765]
This paper focuses on effectively representing and utilizing complementary features from the frame domain and event domain. For learning the unique features of the two domains, we utilize a Unique Extractor for Event (UEE) based on Spiking Neural Networks. Experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-08-10T09:01:42Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.