Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
- URL: http://arxiv.org/abs/2406.18848v1
- Date: Thu, 27 Jun 2024 02:38:25 GMT
- Title: Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
- Authors: Hui Wei, Maxwell A. Xu, Colin Samplawski, James M. Rehg, Santosh Kumar, Benjamin M. Marlin,
- Abstract summary: We study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data.
We construct a novel and large scale data set consisting of a training set with over 3 million hourly step count observations and a test set with over 2.5 million hourly step count observations.
We propose a domain knowledge-informed sparse self-attention model for this task that captures the temporal multi-scale nature of step-count data.
- Score: 25.76458454501612
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a novel and large scale data set consisting of a training set with over 3 million hourly step count observations and a test set with over 2.5 million hourly step count observations. We propose a domain knowledge-informed sparse self-attention model for this task that captures the temporal multi-scale nature of step-count data. We assess the performance of the model relative to baselines and conduct ablation studies to verify our specific model designs.
Related papers
- Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - Self-supervised Activity Representation Learning with Incremental Data:
An Empirical Study [7.782045150068569]
This research examines the impact of using a self-supervised representation learning model for time series classification tasks.
We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets.
arXiv Detail & Related papers (2023-05-01T01:39:55Z) - DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and
Temporal Relatedness [78.98998551326812]
We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors.
We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis.
We show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.
arXiv Detail & Related papers (2022-09-26T21:59:14Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Pattern Discovery in Time Series with Byte Pair Encoding [12.338599136651261]
We propose an unsupervised method for learning representations of time series based on common patterns identified within them.
In this way the method can capture both long-term and short-term dependencies present in the data.
arXiv Detail & Related papers (2021-05-30T00:47:19Z) - On Disentanglement in Gaussian Process Variational Autoencoders [3.403279506246879]
We introduce a class of models recently introduced that have been successful in different tasks on time series data.
Our model exploits the temporal structure of the data by modeling each latent channel with a GP prior and employing a structured variational distribution.
We provide evidence that we can learn meaningful disentangled representations on real-world medical time series data.
arXiv Detail & Related papers (2021-02-10T15:49:27Z) - Personalized Step Counting Using Wearable Sensors: A Domain Adapted LSTM
Network Approach [0.0]
Tri-axial accelerometer inside PA monitors can be exploited to improve step count accuracy across devices and individuals.
Open-source raw sensor data was used to construct a long short term memory (LSTM) deep neural network to model step count.
A small amount of subject-specific data was domain adapted to produce personalized models with high individualized step count accuracy.
arXiv Detail & Related papers (2020-12-11T19:52:43Z) - Predicting Parkinson's Disease with Multimodal Irregularly Collected
Longitudinal Smartphone Data [75.23250968928578]
Parkinsons Disease is a neurological disorder and prevalent in elderly people.
Traditional ways to diagnose the disease rely on in-person subjective clinical evaluations on the quality of a set of activity tests.
We propose a novel time-series based approach to predicting Parkinson's Disease with raw activity test data collected by smartphones in the wild.
arXiv Detail & Related papers (2020-09-25T01:50:15Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.