WAIR-D: Wireless AI Research Dataset
- URL: http://arxiv.org/abs/2212.02159v1
- Date: Mon, 5 Dec 2022 10:59:05 GMT
- Title: WAIR-D: Wireless AI Research Dataset
- Authors: Yourui Huangfu and Jian Wang and Shengchen Dai and Rong Li and Jun
Wang and Chongwen Huang and Zhaoyang Zhang
- Abstract summary: We present the Wireless AI Research dataset (WAIR-D)1, which consists of two scenarios.
Scenario 1 contains 10,000 environments with sparsely dropped user equipments (UEs), and Scenario 2 contains 100 environments with densely dropped UEs.
The large volume of the data guarantees that the trained AI models enjoy good generalization capability, while fine-tuning can be easily carried out on a specific chosen environment.
- Score: 20.535443650889825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a common sense that datasets with high-quality data samples play an
important role in artificial intelligence (AI), machine learning (ML) and
related studies. However, although AI/ML has been introduced in wireless
researches long time ago, few datasets are commonly used in the research
community. Without a common dataset, AI-based methods proposed for wireless
systems are hard to compare with both the traditional baselines and even each
other. The existing wireless AI researches usually rely on datasets generated
based on statistical models or ray-tracing simulations with limited
environments. The statistical data hinder the trained AI models from further
fine-tuning for a specific scenario, and ray-tracing data with limited
environments lower down the generalization capability of the trained AI models.
In this paper, we present the Wireless AI Research Dataset (WAIR-D)1, which
consists of two scenarios. Scenario 1 contains 10,000 environments with
sparsely dropped user equipments (UEs), and Scenario 2 contains 100
environments with densely dropped UEs. The environments are randomly picked up
from more than 40 cities in the real world map. The large volume of the data
guarantees that the trained AI models enjoy good generalization capability,
while fine-tuning can be easily carried out on a specific chosen environment.
Moreover, both the wireless channels and the corresponding environmental
information are provided in WAIR-D, so that extra-information-aided
communication mechanism can be designed and evaluated. WAIR-D provides the
researchers benchmarks to compare their different designs or reproduce results
of others. In this paper, we show the detailed construction of this dataset and
examples of using it.
Related papers
- Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control.
The mapping between context and AI model parameters is ideally done in a zero-shot fashion.
This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications.
ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations.
We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Choose, not Hoard: Information-to-Model Matching for Artificial
Intelligence in O-RAN [8.52291735627073]
Open Radio Access Network (O-RAN) is an emerging paradigm, whereby network infrastructure elements communicate via open, standardized interfaces.
A key element therein is the RAN Intelligent Controller (RIC), an Artificial Intelligence (AI)-based controller.
In this paper we introduce, discuss, and evaluate the creation of multiple AI model instances at different RICs, leveraging information from some (or all) locations for their training.
arXiv Detail & Related papers (2022-08-01T15:24:27Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Federated Visual Classification with Real-World Data Distribution [9.564468846277366]
We characterize the effect real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm.
We introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits.
We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.
arXiv Detail & Related papers (2020-03-18T07:55:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.