Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks
- URL: http://arxiv.org/abs/2411.16421v1
- Date: Mon, 25 Nov 2024 14:25:39 GMT
- Title: Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks
- Authors: Asanobu Kitamoto, Erwan Dzik, Gaspar Faure,
- Abstract summary: This paper presents a new version of the longest typhoon satellite image dataset for 40+ years aimed at machine learning models for long-term-temporal data.
The new addition in dataset V2 is tropical cyclone data from the southern hemisphere.
Having data from two hemispheres allows us to ask new research questions about regional differences across basins and hemispheres.
- Score: 0.30723404270319693
- License:
- Abstract: This paper presents the Digital Typhoon Dataset V2, a new version of the longest typhoon satellite image dataset for 40+ years aimed at benchmarking machine learning models for long-term spatio-temporal data. The new addition in Dataset V2 is tropical cyclone data from the southern hemisphere, in addition to the northern hemisphere data in Dataset V1. Having data from two hemispheres allows us to ask new research questions about regional differences across basins and hemispheres. We also discuss new developments in representations and tasks of the dataset. We first introduce a self-supervised learning framework for representation learning. Combined with the LSTM model, we discuss performance on intensity forecasting and extra-tropical transition forecasting tasks. We then propose new tasks, such as the typhoon center estimation task. We show that an object detection-based model performs better for stronger typhoons. Finally, we study how machine learning models can generalize across basins and hemispheres, by training the model on the northern hemisphere data and testing it on the southern hemisphere data. The dataset is publicly available at \url{http://agora.ex.nii.ac.jp/digital-typhoon/dataset/} and \url{https://github.com/kitamoto-lab/digital-typhoon/}.
Related papers
- EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.
The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.
Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks [1.0369983700531806]
High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models.
Here we introduce WxC-Bench, a multi-modal dataset designed to support the development of generalizable AI models.
We provide a comprehensive description of the dataset and also present a technical validation for baseline analysis.
arXiv Detail & Related papers (2024-12-03T19:20:27Z) - SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting [63.01035584154509]
We develop a fully automated pipeline and construct a large-scale dataset named MidEast-TE from about 0.6 million news articles.
This dataset focuses on the cooperation and conflict events among countries mainly in the MidEast region from 2015 to 2022.
We propose a novel method LoGo that is able to take advantage of both Local and Global contexts for SCTc-TE forecasting.
arXiv Detail & Related papers (2023-12-02T07:40:21Z) - Digital Typhoon: Long-term Satellite Image Dataset for the
Spatio-Temporal Modeling of Tropical Cyclones [0.907599024697789]
This paper presents the official release of the longest typhoon satellite image dataset for 40+ years.
It is aimed at benchmarking machine learning models for long-term-temporal data.
The dataset is publicly available at http://agora.nii.ac.jp/digital-typhoon/.
arXiv Detail & Related papers (2023-11-05T14:22:13Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - Simulation of Atlantic Hurricane Tracks and Features: A Deep Learning
Approach [0.0]
This paper employs machine learning (ML) and deep learning (DL) techniques to obtain from input data (storm features) available in or derived from the HURDAT2 database models.
In pursuit of this objective, a trajectory model providing the storm center in terms of longitude and latitude, and intensity models providing the central pressure and maximum 1-$min$ wind speed at 10 $m$ elevation were created.
The efficacy of the storm simulation models is demonstrated for three examples: New Orleans, Miami and Cape Hatteras.
arXiv Detail & Related papers (2022-08-12T13:14:25Z) - Physics Informed Shallow Machine Learning for Wind Speed Prediction [66.05661813632568]
We analyze a massive dataset of wind measured from anemometers located at 10 m height in 32 locations in Italy.
We train supervised learning algorithms using the past history of wind to predict its value at a future time.
We find that the optimal design as well as its performance vary with the location.
arXiv Detail & Related papers (2022-04-01T14:55:10Z) - SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and
Benchmarking [20.442879707675115]
SubseasonalClimateUSA is a curated dataset for training and benchmarking subseasonal forecasting models in the United States.
We use this dataset to benchmark a diverse suite of models, including operational dynamical models, classical meteorological baselines, and ten state-of-the-art machine learning and deep learning-based methods from the literature.
arXiv Detail & Related papers (2021-09-21T18:42:10Z) - Hurricane Forecasting: A Novel Multimodal Machine Learning Framework [2.829284162137884]
Our framework, called Hurricast, efficiently combines spatial-temporal data with statistical data.
The inclusion of Hurricast into an operational forecast consensus model could improve over the National Hurricane Center's official forecast.
arXiv Detail & Related papers (2020-11-11T23:55:33Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.