Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based
Localization: A Deep Learning Approach
- URL: http://arxiv.org/abs/2105.01903v2
- Date: Thu, 6 May 2021 07:57:37 GMT
- Title: Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based
Localization: A Deep Learning Approach
- Authors: Mohammad Nabati, Hojjat Navidan, Reza Shahbazian, Seyed Ali Ghorashi
and David Windridge
- Abstract summary: We introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data.
Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data.
We can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data.
- Score: 1.6379393441314491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-centered data collection is typically costly and implicates issues of
privacy. Various solutions have been proposed in the literature to reduce this
cost, such as crowdsourced data collection, or the use of semi-supervised
algorithms. However, semi-supervised algorithms require a source of unlabeled
data, and crowd-sourcing methods require numbers of active participants. An
alternative passive data collection modality is fingerprint-based localization.
Such methods use received signal strength (RSS) or channel state information
(CSI) in wireless sensor networks to localize users in indoor/outdoor
environments. In this paper, we introduce a novel approach to reduce training
data collection costs in fingerprint-based localization by using synthetic
data. Generative adversarial networks (GANs) are used to learn the distribution
of a limited sample of collected data and, following this, to produce synthetic
data that can be used to augment the real collected data in order to increase
overall positioning accuracy. Experimental results on a benchmark dataset show
that by applying the proposed method and using a combination of 10% collected
data and 90% synthetic data, we can obtain essentially similar positioning
accuracy to that which would be obtained by using the full set of collected
data. This means that by employing GAN-generated synthetic data, we can use 90%
less real data, thereby reduce data-collection costs while achieving acceptable
accuracy.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Federated Impression for Learning with Distributed Heterogeneous Data [19.50235109938016]
Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data.
In FL, sub-optimal convergence is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers.
We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression.
arXiv Detail & Related papers (2024-09-11T15:37:52Z) - Personalized Federated Learning via Active Sampling [50.456464838807115]
This paper proposes a novel method for sequentially identifying similar (or relevant) data generators.
Our method evaluates the relevance of a data generator by evaluating the effect of a gradient step using its local dataset.
We extend this method to non-parametric models by a suitable generalization of the gradient step to update a hypothesis using the local dataset provided by a data generator.
arXiv Detail & Related papers (2024-09-03T17:12:21Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Group Distributionally Robust Dataset Distillation with Risk
Minimization [18.07189444450016]
We introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD.
We demonstrate its effective generalization and robustness across subgroups through numerical experiments.
arXiv Detail & Related papers (2024-02-07T09:03:04Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Post-training Model Quantization Using GANs for Synthetic Data
Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method.
We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z) - Distributed sequential federated learning [0.0]
We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data.
We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
arXiv Detail & Related papers (2023-01-31T21:20:45Z) - GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using
Macro Data Sources [21.32471030724983]
Individual-level data (microdata) that characterizes a population is essential for studying many real-world problems.
In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data.
arXiv Detail & Related papers (2022-12-08T01:22:12Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.