Related papers: WAIR-D: Wireless AI Research Dataset

WAIR-D: Wireless AI Research Dataset

URL: http://arxiv.org/abs/2212.02159v1
Date: Mon, 5 Dec 2022 10:59:05 GMT
Title: WAIR-D: Wireless AI Research Dataset
Authors: Yourui Huangfu and Jian Wang and Shengchen Dai and Rong Li and Jun Wang and Chongwen Huang and Zhaoyang Zhang
Abstract summary: We present the Wireless AI Research dataset (WAIR-D)1, which consists of two scenarios. Scenario 1 contains 10,000 environments with sparsely dropped user equipments (UEs), and Scenario 2 contains 100 environments with densely dropped UEs. The large volume of the data guarantees that the trained AI models enjoy good generalization capability, while fine-tuning can be easily carried out on a specific chosen environment.
Score: 20.535443650889825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is a common sense that datasets with high-quality data samples play an important role in artificial intelligence (AI), machine learning (ML) and related studies. However, although AI/ML has been introduced in wireless researches long time ago, few datasets are commonly used in the research community. Without a common dataset, AI-based methods proposed for wireless systems are hard to compare with both the traditional baselines and even each other. The existing wireless AI researches usually rely on datasets generated based on statistical models or ray-tracing simulations with limited environments. The statistical data hinder the trained AI models from further fine-tuning for a specific scenario, and ray-tracing data with limited environments lower down the generalization capability of the trained AI models. In this paper, we present the Wireless AI Research Dataset (WAIR-D)1, which consists of two scenarios. Scenario 1 contains 10,000 environments with sparsely dropped user equipments (UEs), and Scenario 2 contains 100 environments with densely dropped UEs. The environments are randomly picked up from more than 40 cities in the real world map. The large volume of the data guarantees that the trained AI models enjoy good generalization capability, while fine-tuning can be easily carried out on a specific chosen environment. Moreover, both the wireless channels and the corresponding environmental information are provided in WAIR-D, so that extra-information-aided communication mechanism can be designed and evaluated. WAIR-D provides the researchers benchmarks to compare their different designs or reproduce results of others. In this paper, we show the detailed construction of this dataset and examples of using it.

Related papers

A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity [0.0]
Synthetic data generators provide a promising solution by replicating the statistical and structural properties of real data.<n>This study evaluates the performance of six synthetic data generators from two widely used open-source libraries.
arXiv Detail & Related papers (2025-06-21T22:45:40Z)
Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques [1.0712226955584796]
Human activity recognition (HAR) is often limited by the scarcity of labeled datasets.<n>Recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer.
arXiv Detail & Related papers (2025-06-09T10:25:53Z)
Synthetic Data Generation for Minimum-Exposure Navigation in a Time-Varying Environment using Generative AI Models [0.5499796332553707]
We study the problem of synthetic generation of samples of environmental features for autonomous vehicle navigation. The proposed solution is a generative artificial intelligence model that we refer to as a split variational recurrent neural network (S-VRNN) The S-VRNN merges the capabilities of a variational autoencoder, which is a widely used generative model, and a recurrent neural network, which is used to learn temporal dependencies in data.
arXiv Detail & Related papers (2025-03-09T13:45:15Z)
From Gaming to Research: GTA V for Synthetic Data Generation for Robotics and Navigations [2.7383830691749163]
We introduce a synthetic dataset created using the virtual environment of the video game Grand Theft Auto V (GTA V) We demonstrate that synthetic data derived from GTA V are qualitatively comparable to real-world data.
arXiv Detail & Related papers (2025-02-17T20:22:52Z)
Generative AI for Data Augmentation in Wireless Networks: Analysis, Applications, and Case Study [59.780800481241066]
Generative Artificial Intelligence (GenAI) can be an effective alternative to wireless data augmentation. This article explores the potential and effectiveness of GenAI-driven data augmentation in wireless networks. We propose a general generative diffusion model-based data augmentation framework for Wi-Fi gesture recognition.
arXiv Detail & Related papers (2024-11-13T05:15:25Z)
Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR. First, we illustrate the common data generation-based applications in SAR field. Then, an overview of the latest GenAI models is systematically reviewed. Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z)
Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The mapping between context and AI model parameters is ideally done in a zero-shot fashion. This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications. ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations. We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z)
FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture. We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z)
Choose, not Hoard: Information-to-Model Matching for Artificial Intelligence in O-RAN [8.52291735627073]
Open Radio Access Network (O-RAN) is an emerging paradigm, whereby network infrastructure elements communicate via open, standardized interfaces. A key element therein is the RAN Intelligent Controller (RIC), an Artificial Intelligence (AI)-based controller. In this paper we introduce, discuss, and evaluate the creation of multiple AI model instances at different RICs, leveraging information from some (or all) locations for their training.
arXiv Detail & Related papers (2022-08-01T15:24:27Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Federated Visual Classification with Real-World Data Distribution [9.564468846277366]
We characterize the effect real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. We introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.
arXiv Detail & Related papers (2020-03-18T07:55:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.