ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification
- URL: http://arxiv.org/abs/2410.09875v1
- Date: Sun, 13 Oct 2024 15:34:11 GMT
- Title: ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification
- Authors: Chen Mao, Chong Tan, Jingqi Hu, Min Zheng,
- Abstract summary: Person re-identification (ReID) plays a vital role in safety inspections, personnel counting, and more.
Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions.
We leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals.
- Score: 3.3743041904085125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person re-identification(ReID), as a crucial technology in the field of security, plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions such as clothing changes and occlusions. In addition to cameras, we leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals and contribute a multimodal dataset. We employ a two-stream network to separately process video understanding and signal analysis tasks, and conduct multi-modal fusion and contrastive learning on pedestrian video and WiFi data. Extensive experiments in real-world scenarios demonstrate that our method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.
Related papers
- Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification [3.3743041904085125]
Person re-identification (ReID) plays an important role in security detection and people counting.
This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features.
We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals.
arXiv Detail & Related papers (2024-07-12T07:10:47Z) - LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z) - GaitFi: Robust Device-Free Human Identification via WiFi and Vision
Multimodal Learning [33.89340087471202]
We propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification.
In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras.
To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi.
Experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition
arXiv Detail & Related papers (2022-08-30T15:07:43Z) - Self-Supervised Multimodal Fusion Transformer for Passive Activity
Recognition [2.35066982314539]
Wi-Fi signals provide significant opportunities for human sensing and activity recognition in fields such as healthcare.
Current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities.
We propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion.
arXiv Detail & Related papers (2022-08-15T15:38:10Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Unsupervised Person Re-Identification with Wireless Positioning under
Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling.
Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information.
Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Vision Meets Wireless Positioning: Effective Person Re-identification
with Recurrent Context Propagation [120.18969251405485]
Existing person re-identification methods rely on the visual sensor to capture the pedestrians.
Mobile phone can be sensed by WiFi and cellular networks in the form of a wireless positioning signal.
We propose a novel recurrent context propagation module that enables information to propagate between visual data and wireless positioning data.
arXiv Detail & Related papers (2020-08-10T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.