Related papers: Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things

Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things

URL: http://arxiv.org/abs/2207.06789v1
Date: Thu, 14 Jul 2022 10:04:18 GMT
Title: Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things
Authors: Alessandro Masullo and Toby Perrett and Tilo Burghardt and Ian Craddock and Dima Damen and Majid Mirmehdi
Abstract summary: We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability. Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time.
Score: 82.15959827765325
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) which takes advantage of learning using privileged information (LUPI). We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability. Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time. We evaluate the proposed model on inertial data from a wearable accelerometer device, using RGB videos and skeletons as privileged modalities, and show an improvement of accuracy of an average 6.6% on the UTD-MHAD dataset and an average 5.5% on the Berkeley MHAD dataset, reaching a new state-of-the-art for inertial-only classification accuracy on these datasets. We validate our framework through several ablation studies.

Related papers

VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences. Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation [24.48561340129571]
RingMoE is a unified RS foundation model with 14.7 billion parameters, pre-trained on 400 million multi-modal RS images from nine satellites. It has been deployed and trialed in multiple sectors, including emergency response, land management, marine sciences, and urban planning.
arXiv Detail & Related papers (2025-04-04T04:47:54Z)
Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing [4.787295791127483]
We introduce a novel framework to process remote sensing imagery using combinations of three spectral bands. Our framework helps identify uncertain predictions, guiding future ground truth data acquisition.
arXiv Detail & Related papers (2024-11-21T13:42:24Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing Data [4.89861083551325]
Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals. In this study, we propose a novel approach to reconstruct high-resolution NTL images using multi-modal remote sensing data.
arXiv Detail & Related papers (2024-02-24T00:25:22Z)
DaFoEs: Mixing Datasets towards the generalization of vision-state deep-learning Force Estimation in Minimally Invasive Robotic Surgery [6.55111164866752]
We present a new vision-haptic dataset (DaFoEs) with variable soft environments for the training of deep neural models. We also present a variable encoder-decoder architecture to predict the forces done by the laparoscopic tool using single input or sequence of inputs.
arXiv Detail & Related papers (2024-01-17T14:39:55Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition [2.436681150766912]
We propose a novel trimodal sensor fusion framework using the audio, visible, and thermal camera. A novel deep latent embedding framework, termed the AVTNet, is proposed to learn multiple latent embeddings. A comparative analysis with baseline algorithms shows that the proposed framework significantly increases the person recognition accuracy.
arXiv Detail & Related papers (2022-10-20T02:39:48Z)
Cross-modal Knowledge Distillation for Vision-to-Sensor Action Recognition [12.682984063354748]
This study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase. This framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach.
arXiv Detail & Related papers (2021-10-08T15:06:38Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim. We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.