Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction
- URL: http://arxiv.org/abs/2402.01969v2
- Date: Tue, 6 Feb 2024 03:22:32 GMT
- Title: Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction
- Authors: Ahmed P. Mohamed, Byunghyun Lee, Yaguang Zhang, Max Hollingsworth, C.
Robert Anderson, James V. Krogmeier, David J. Love
- Abstract summary: This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
- Score: 9.664420734674088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) offers a promising solution to pathloss prediction.
However, its effectiveness can be degraded by the limited availability of data.
To alleviate these challenges, this paper introduces a novel
simulation-enhanced data augmentation method for ML pathloss prediction. Our
method integrates synthetic data generated from a cellular coverage simulator
and independently collected real-world datasets. These datasets were collected
through an extensive measurement campaign in different environments, including
farms, hilly terrains, and residential areas. This comprehensive data
collection provides vital ground truth for model training. A set of channel
features was engineered, including geographical attributes derived from LiDAR
datasets. These features were then used to train our prediction model,
incorporating the highly efficient and robust gradient boosting ML algorithm,
CatBoost. The integration of synthetic data, as demonstrated in our study,
significantly improves the generalizability of the model in different
environments, achieving a remarkable improvement of approximately 12dB in terms
of mean absolute error for the best-case scenario. Moreover, our analysis
reveals that even a small fraction of measurements added to the simulation
training set, with proper data balance, can significantly enhance the model's
performance.
Related papers
- Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples [13.053285552524052]
This paper introduces an innovative Expansive Synthesis model that generates high-fidelity datasets from minimal samples.
We validate our Expansive Synthesis by training classifiers on the generated datasets and comparing their performance toversas trained on larger, original datasets.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Improvement of Applicability in Student Performance Prediction Based on Transfer Learning [2.3290007848431955]
This study proposes a method to improve prediction accuracy by employing transfer learning techniques on the dataset with varying distributions.
The model was trained and evaluated to enhance its generalization ability and prediction accuracy.
Experiments demonstrated that this approach excels in reducing Root Mean Square Error (RMSE) and Mean Absolute Error (MAE)
The results demonstrate that freezing more layers improves performance for complex and noisy data, whereas freezing fewer layers is more effective for simpler and larger datasets.
arXiv Detail & Related papers (2024-06-01T13:09:05Z) - Transfer Learning for Molecular Property Predictions from Small Data Sets [0.0]
We benchmark common machine learning models for the prediction of molecular properties on small data sets.
We present a transfer learning strategy that uses large data sets to pre-train the respective models and allows to obtain more accurate models after fine-tuning on the original data sets.
arXiv Detail & Related papers (2024-04-20T14:25:34Z) - Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets [40.19690479537335]
We show that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks.
This shows that DA-GNNs are a promising method for extracting domain-independent cosmological information.
arXiv Detail & Related papers (2023-11-02T20:40:21Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.