Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns
- URL: http://arxiv.org/abs/2411.10377v2
- Date: Wed, 20 Nov 2024 13:05:40 GMT
- Title: Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns
- Authors: Klervi Le Gall, Lise Bellanger, David Laplaud, Aymeric Stamm,
- Abstract summary: Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide.
The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation.
The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach.
However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools.
- Score: 0.0
- License:
- Abstract: Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide. The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation. The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach. However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools. To address these challenges, this article presents two key scientific contributions. First, a comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method. Second, a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments. The effectiveness of the proposed method, is demonstrated through its application to MS gait data, showing very good fidelity and respect of the initial geometry of the data. Thanks to this work, we are able to produce synthetic data sets and work on the stability of clustering methods.
Related papers
- Synthetic Data Generation with LLM for Improved Depression Prediction [5.508617844957542]
We propose a pipeline for Large Language Models to generate synthetic data to improve the performance of depression prediction models.
Not only was the synthetic data satisfactory in terms of fidelity and privacy-preserving metrics, it also balanced the distribution of severity in the training dataset.
arXiv Detail & Related papers (2024-11-26T18:31:14Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers.
At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information.
The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data [2.307414552248669]
This paper proposes GANsemble: a framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data.
To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.
arXiv Detail & Related papers (2024-04-10T21:23:13Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Feature Matching Data Synthesis for Non-IID Federated Learning [7.740333805796447]
Federated learning (FL) trains neural networks on edge devices without collecting data at a central server.
This paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models.
For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary.
arXiv Detail & Related papers (2023-08-09T07:49:39Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Hide-and-Seek Privacy Challenge [88.49671206936259]
The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems.
In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
arXiv Detail & Related papers (2020-07-23T15:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.