ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive
Learning
- URL: http://arxiv.org/abs/2210.05513v1
- Date: Tue, 11 Oct 2022 15:04:05 GMT
- Title: ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive
Learning
- Authors: Nicholas Meegan, Hansi Liu, Bryan Cao, Abrar Alali, Kristin Dana,
Marco Gruteser, Shubham Jain and Ashwin Ashok
- Abstract summary: ViFiCon is a self-supervised contrastive learning scheme which uses synchronized information across vision and wireless modalities to perform cross-modal association.
We show that ViFiCon achieves high performance vision-to- wireless association, finding which bounding box corresponds to which smartphone device.
- Score: 5.5232283752707785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce ViFiCon, a self-supervised contrastive learning scheme which
uses synchronized information across vision and wireless modalities to perform
cross-modal association. Specifically, the system uses pedestrian data
collected from RGB-D camera footage as well as WiFi Fine Time Measurements
(FTM) from a user's smartphone device. We represent the temporal sequence by
stacking multi-person depth data spatially within a banded image. Depth data
from RGB-D (vision domain) is inherently linked with an observable pedestrian,
but FTM data (wireless domain) is associated only to a smartphone on the
network. To formulate the cross-modal association problem as self-supervised,
the network learns a scene-wide synchronization of the two modalities as a
pretext task, and then uses that learned representation for the downstream task
of associating individual bounding boxes to specific smartphones, i.e.
associating vision and wireless information. We use a pre-trained region
proposal model on the camera footage and then feed the extrapolated bounding
box information into a dual-branch convolutional neural network along with the
FTM data. We show that compared to fully supervised SoTA models, ViFiCon
achieves high performance vision-to-wireless association, finding which
bounding box corresponds to which smartphone device, without hand-labeled
association examples for training data.
Related papers
- ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time
Measurements [6.632056181867312]
We propose ViFiT, a transformer-based model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements)
ViFiT achieves an MRFR of 0.65 that outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM-Decoder architecture.
arXiv Detail & Related papers (2023-10-04T20:05:40Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - HiNoVa: A Novel Open-Set Detection Method for Automating RF Device
Authentication [9.571774189070531]
We introduce a novel open-set detection approach based on the patterns of the hidden state values within a Convolutional Neural Network (CNN) Long Short-Term Memory (LSTM) model.
Our approach greatly improves the Area Under the Precision-Recall Curve on LoRa, Wireless-WiFi, and Wired-WiFi datasets.
arXiv Detail & Related papers (2023-05-16T16:47:02Z) - WiFi-based Spatiotemporal Human Action Perception [53.41825941088989]
An end-to-end WiFi signal neural network (SNN) is proposed to enable WiFi-only sensing in both line-of-sight and non-line-of-sight scenarios.
Especially, the 3D convolution module is able to explore thetemporal continuity of WiFi signals, and the feature self-attention module can explicitly maintain dominant features.
arXiv Detail & Related papers (2022-06-20T16:03:45Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Multi-Band Wi-Fi Sensing with Matched Feature Granularity [37.40429912751046]
We propose a multi-band Wi-Fi fusion method for Wi-Fi sensing that hierarchically fuses the features from both the fine-grained CSI at sub-6 GHz and the mid-grained beam SNR at 60 GHz.
To address the issue of limited labeled training data, we propose an autoencoder-based multi-band Wi-Fi fusion network that can be pre-trained in an unsupervised fashion.
arXiv Detail & Related papers (2021-12-28T05:50:58Z) - Unsupervised Person Re-Identification with Wireless Positioning under
Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling.
Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information.
Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Federated Self-Supervised Learning of Multi-Sensor Representations for
Embedded Intelligence [8.110949636804772]
Smartphones, wearables, and Internet of Things (IoT) devices produce a wealth of data that cannot be accumulated in a centralized repository for learning supervised models.
We propose a self-supervised approach termed textitscalogram-signal correspondence learning based on wavelet transform to learn useful representations from unlabeled sensor inputs.
We extensively assess the quality of learned features with our multi-view strategy on diverse public datasets, achieving strong performance in all domains.
arXiv Detail & Related papers (2020-07-25T21:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.