Quantified Sleep: Machine learning techniques for observational n-of-1
studies
- URL: http://arxiv.org/abs/2105.06811v1
- Date: Fri, 14 May 2021 13:13:17 GMT
- Title: Quantified Sleep: Machine learning techniques for observational n-of-1
studies
- Authors: Gianluca Truda
- Abstract summary: This paper applies statistical learning techniques to an observational Quantified-Self study to build a descriptive model of sleep quality.
Sleep quality is one of the most difficult modelling targets in QS research, due to high noise and a large number of weakly-contributing factors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper applies statistical learning techniques to an observational
Quantified-Self (QS) study to build a descriptive model of sleep quality. A
total of 472 days of my sleep data was collected with an Oura ring and combined
with lifestyle, environmental, and psychological data. Such n-of-1 QS projects
pose a number of challenges: heterogeneous data sources; missing values; high
dimensionality; dynamic feedback loops; human biases. This paper directly
addresses these challenges with an end-to-end QS pipeline that produces robust
descriptive models. Sleep quality is one of the most difficult modelling
targets in QS research, due to high noise and a large number of
weakly-contributing factors. Sleep quality was selected so that approaches from
this paper would generalise to most other n-of-1 QS projects. Techniques are
presented for combining and engineering features for the different classes of
data types, sample frequencies, and schema - including event logs, weather, and
geo-spatial data. Statistical analyses for outliers, normality,
(auto)correlation, stationarity, and missing data are detailed, along with a
proposed method for hierarchical clustering to identify correlated groups of
features. The missing data was overcome using a combination of knowledge-based
and statistical techniques, including several multivariate imputation
algorithms. "Markov unfolding" is presented for collapsing the time series into
a collection of independent observations, whilst incorporating historical
information. The final model was interpreted in two ways: by inspecting the
internal $\beta$-parameters, and using the SHAP framework. These two
interpretation techniques were combined to produce a list of the 16
most-predictive features, demonstrating that an observational study can greatly
narrow down the number of features that need to be considered when designing
interventional QS studies.
Related papers
- Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Few-Shot Forecasting of Time-Series with Heterogeneous Channels [4.635820333232681]
We develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding.
We show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios.
arXiv Detail & Related papers (2022-04-07T14:02:15Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z) - How well do you know your summarization datasets? [11.992125069326772]
We analyze 600 samples from three popular summarization datasets.
We follow with a thorough analysis of 27 state-of-the-art summarization models and 5 popular metrics.
arXiv Detail & Related papers (2021-06-21T19:44:06Z) - Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research.
Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z) - Learning Quantities of Interest from Dynamical Systems for
Observation-Consistent Inversion [0.0]
We present a new framework, Learning Uncertain Quantities (LUQ), that facilitates the tractable solution of SIPs in dynamical systems.
LUQ provides routines for filtering data, unsupervised learning of the underlying dynamics, classifying observations, and feature extraction to learn the QoI map.
For scientific use, we provide links to our Python implementation of LUQ and to all data and scripts required to reproduce the results in this manuscript.
arXiv Detail & Related papers (2020-09-15T08:27:27Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.