Virtual Fusion with Contrastive Learning for Single Sensor-based
Activity Recognition
- URL: http://arxiv.org/abs/2312.02185v1
- Date: Fri, 1 Dec 2023 17:03:27 GMT
- Title: Virtual Fusion with Contrastive Learning for Single Sensor-based
Activity Recognition
- Authors: Duc-Anh Nguyen, Cuong Pham, Nhien-An Le-Khac
- Abstract summary: Various types of sensors can be used for Human Activity Recognition (HAR)
Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions.
We propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference.
- Score: 5.225544155289783
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Various types of sensors can be used for Human Activity Recognition (HAR),
and each of them has different strengths and weaknesses. Sometimes a single
sensor cannot fully observe the user's motions from its perspective, which
causes wrong predictions. While sensor fusion provides more information for
HAR, it comes with many inherent drawbacks like user privacy and acceptance,
costly set-up, operation, and maintenance. To deal with this problem, we
propose Virtual Fusion - a new method that takes advantage of unlabeled data
from multiple time-synchronized sensors during training, but only needs one
sensor for inference. Contrastive learning is adopted to exploit the
correlation among sensors. Virtual Fusion gives significantly better accuracy
than training with the same single sensor, and in some cases, it even surpasses
actual fusion using multiple sensors at test time. We also extend this method
to a more general version called Actual Fusion within Virtual Fusion (AFVF),
which uses a subset of training sensors during inference. Our method achieves
state-of-the-art accuracy and F1-score on UCI-HAR and PAMAP2 benchmark
datasets. Implementation is available upon request.
Related papers
- Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.
Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities.
We set the new state of the art with CAFuser on the MUSES dataset with 59.7 PQ for multimodal panoptic segmentation and 78.2 mIoU for semantic segmentation, ranking first on the public benchmarks.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion [10.439802168557513]
Motion capture from a limited number of body-worn sensors has important applications in health, human performance, and entertainment.
Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs.
We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from an arbitrary combination of sensors.
arXiv Detail & Related papers (2023-08-31T12:36:50Z) - Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition [3.2319909486685354]
A key problem holding up progress in wearable sensor-based human activity recognition is the unavailability of diverse and labeled training data.
We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition.
By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data.
arXiv Detail & Related papers (2023-05-30T15:12:59Z) - AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry
Estimation [39.351088248776435]
We propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors.
Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources.
Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions.
arXiv Detail & Related papers (2022-06-26T19:29:08Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - Mobile Behavioral Biometrics for Passive Authentication [65.94403066225384]
This work carries out a comparative analysis of unimodal and multimodal behavioral biometric traits.
Experiments are performed over HuMIdb, one of the largest and most comprehensive freely available mobile user interaction databases.
In our experiments, the most discriminative background sensor is the magnetometer, whereas among touch tasks the best results are achieved with keystroke.
arXiv Detail & Related papers (2022-03-14T17:05:59Z) - More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced
Modality of Wearable Sensors [18.947172818861773]
Fusing multiple sensors is a common scenario in many applications, but may not always be feasible in real-world scenarios.
We propose an effective more to less (M2L) learning framework to improve testing performance with reduced sensors.
arXiv Detail & Related papers (2022-02-16T18:23:29Z) - Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities.
We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains.
We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z) - WaveGlove: Transformer-based hand gesture recognition using multiple
inertial sensors [0.0]
Hand Gesture Recognition (HGR) based on inertial data has grown considerably in recent years.
In this work we explore the benefits of using multiple inertial sensors.
arXiv Detail & Related papers (2021-05-04T20:50:53Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.