Related papers: ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification

ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification

URL: http://arxiv.org/abs/2410.09875v1
Date: Sun, 13 Oct 2024 15:34:11 GMT
Title: ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification
Authors: Chen Mao, Chong Tan, Jingqi Hu, Min Zheng,
Abstract summary: Person re-identification (ReID) plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions. We leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals.
Score: 3.3743041904085125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Person re-identification(ReID), as a crucial technology in the field of security, plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions such as clothing changes and occlusions. In addition to cameras, we leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals and contribute a multimodal dataset. We employ a two-stream network to separately process video understanding and signal analysis tasks, and conduct multi-modal fusion and contrastive learning on pedestrian video and WiFi data. Extensive experiments in real-world scenarios demonstrate that our method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.

Related papers

Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations [4.595408704451027]
Wi-Fi sensing is gaining momentum as a non-intrusive and privacy-preserving alternative to vision-based systems for human identification.<n>Most prior wireless-based approaches rely on movement patterns, such as walking gait, to extract biometric cues.<n>We propose a transformer-based method that identifies individuals from Channel State Information recorded while the subject remains stationary.
arXiv Detail & Related papers (2025-07-17T07:26:07Z)
Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication [7.883758003805773]
Vehicle-to-Infrastructure (V2I) communication plays a key role in enabling mobility-enhancing and safety-critical services. Current V2I authentication relies on credential-based methods over wireless Non-Line-of-Sight (NLOS) channels. We propose a unified Multi-Channel, Multi-Factor Authentication scheme that combines NLOS cryptographic credentials with a Line-of-Sight (LOS) visual channel.
arXiv Detail & Related papers (2025-05-01T06:36:24Z)
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID. This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging. We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z)
Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification [3.3743041904085125]
Person re-identification (ReID) plays an important role in security detection and people counting. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals.
arXiv Detail & Related papers (2024-07-12T07:10:47Z)
LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition. Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z)
GaitFi: Robust Device-Free Human Identification via WiFi and Vision Multimodal Learning [33.89340087471202]
We propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification. In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras. To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi. Experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition
arXiv Detail & Related papers (2022-08-30T15:07:43Z)
Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition [2.35066982314539]
Wi-Fi signals provide significant opportunities for human sensing and activity recognition in fields such as healthcare. Current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. We propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion.
arXiv Detail & Related papers (2022-08-15T15:38:10Z)
A Wireless-Vision Dataset for Privacy Preserving Human Activity Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition. Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling. Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information. Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z)
Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
Vision Meets Wireless Positioning: Effective Person Re-identification with Recurrent Context Propagation [120.18969251405485]
Existing person re-identification methods rely on the visual sensor to capture the pedestrians. Mobile phone can be sensed by WiFi and cellular networks in the form of a wireless positioning signal. We propose a novel recurrent context propagation module that enables information to propagate between visual data and wireless positioning data.
arXiv Detail & Related papers (2020-08-10T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.