WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks
- URL: http://arxiv.org/abs/2412.02780v1
- Date: Tue, 03 Dec 2024 19:20:27 GMT
- Title: WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks
- Authors: Rajat Shinde, Christopher E. Phillips, Kumar Ankur, Aman Gupta, Simon Pfreundschuh, Sujit Roy, Sheyenne Kirkland, Vishal Gaur, Amy Lin, Aditi Sheshadri, Udaysankar Nair, Manil Maskey, Rahul Ramachandran,
- Abstract summary: High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models.<n>Here we introduce WxC-Bench, a multi-modal dataset designed to support the development of generalizable AI models.<n>We provide a comprehensive description of the dataset and also present a technical validation for baseline analysis.
- Score: 1.0369983700531806
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models or fine-tuning existing models for scientific applications such as weather and climate analysis. Unfortunately, despite the growing development of new deep learning models for weather and climate, there is a scarcity of curated, pre-processed machine learning (ML)-ready datasets. Curating such high-quality datasets for developing new models is challenging particularly because the modality of the input data varies significantly for different downstream tasks addressing different atmospheric scales (spatial and temporal). Here we introduce WxC-Bench (Weather and Climate Bench), a multi-modal dataset designed to support the development of generalizable AI models for downstream use-cases in weather and climate research. WxC-Bench is designed as a dataset of datasets for developing ML-models for a complex weather and climate system, addressing selected downstream tasks as machine learning phenomenon. WxC-Bench encompasses several atmospheric processes from meso-$\beta$ (20 - 200 km) scale to synoptic scales (2500 km), such as aviation turbulence, hurricane intensity and track monitoring, weather analog search, gravity wave parameterization, and natural language report generation. We provide a comprehensive description of the dataset and also present a technical validation for baseline analysis. The dataset and code to prepare the ML-ready data have been made publicly available on Hugging Face -- https://huggingface.co/datasets/nasa-impact/WxC-Bench
Related papers
- UNet with Axial Transformer : A Neural Weather Model for Precipitation Nowcasting [0.06906005491572399]
We develop a novel method that employs Transformer-based machine learning models to forecast precipitation.
This paper represents an initial research on the dataset used in the domain of next frame prediciton.
arXiv Detail & Related papers (2025-04-28T01:20:30Z) - A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences [59.05404971880922]
Many problems in meteorology can now be addressed using AI models.
Data-driven algorithms have significantly improved accuracy compared to traditional methods.
We propose a new paradigm where observational data from different perspectives are treated as multimodal data and integrated via transformers.
arXiv Detail & Related papers (2025-04-19T04:31:35Z) - Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling [55.13352174687475]
This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales.
Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale.
We introduce a lead time-aware training framework to promote the generalization of the model at different lead times.
arXiv Detail & Related papers (2024-05-22T16:21:02Z) - ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning [26.151056828513962]
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios.
The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks.
Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives.
arXiv Detail & Related papers (2023-11-07T04:55:36Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Unleashing Realistic Air Quality Forecasting: Introducing the
Ready-to-Use PurpleAirSF Dataset [4.190243190157989]
This paper introduces PurpleAirSF, a comprehensive and easily accessible dataset from the PurpleAir network.
We present a detailed account of the data collection and processing methods employed to build PurpleAirSF.
We conduct preliminary experiments using both classic and modern-temporal forecasting models, thereby establishing a benchmark for future air quality forecasting tasks.
arXiv Detail & Related papers (2023-06-24T12:10:16Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Learning to Simulate Realistic LiDARs [66.7519667383175]
We introduce a pipeline for data-driven simulation of a realistic LiDAR sensor.
We show that our model can learn to encode realistic effects such as dropped points on transparent surfaces.
We use our technique to learn models of two distinct LiDAR sensors and use them to improve simulated LiDAR data accordingly.
arXiv Detail & Related papers (2022-09-22T13:12:54Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - WeatherBench: A benchmark dataset for data-driven weather forecasting [17.76377510880905]
We present a benchmark dataset for data-driven medium-range weather forecasting.
We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models.
We provide baseline scores from simple linear regression techniques, deep learning models, as well as purely physical forecasting models.
arXiv Detail & Related papers (2020-02-02T19:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.