EMFormer: Efficient Multi-Scale Transformer for Accumulative Context Weather Forecasting
- URL: http://arxiv.org/abs/2602.01194v1
- Date: Sun, 01 Feb 2026 12:36:01 GMT
- Title: EMFormer: Efficient Multi-Scale Transformer for Accumulative Context Weather Forecasting
- Authors: Hao Chen, Tao Han, Jie Zhang, Song Guo, Fenghua Ling, Lei Bai,
- Abstract summary: Long-term weather forecasting is critical for socioeconomic planning and disaster preparedness.<n>We present a novel pipeline across pretraining, finetuning and forecasting to enhance long-context modeling while reducing computational overhead.<n>We introduce an Efficient Multi-scale Transformer (EMFormer) to extract multi-scale features through a single convolution in both training and inference.
- Score: 34.793750131438934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-term weather forecasting is critical for socioeconomic planning and disaster preparedness. While recent approaches employ finetuning to extend prediction horizons, they remain constrained by the issues of catastrophic forgetting, error accumulation, and high training overhead. To address these limitations, we present a novel pipeline across pretraining, finetuning and forecasting to enhance long-context modeling while reducing computational overhead. First, we introduce an Efficient Multi-scale Transformer (EMFormer) to extract multi-scale features through a single convolution in both training and inference. Based on the new architecture, we further employ an accumulative context finetuning to improve temporal consistency without degrading short-term accuracy. Additionally, we propose a composite loss that dynamically balances different terms via a sinusoidal weighting, thereby adaptively guiding the optimization trajectory throughout pretraining and finetuning. Experiments show that our approach achieves strong performance in weather forecasting and extreme event prediction, substantially improving long-term forecast accuracy. Moreover, EMFormer demonstrates strong generalization on vision benchmarks (ImageNet-1K and ADE20K) while delivering a 5.69x speedup over conventional multi-scale modules.
Related papers
- Back to the Future: Look-ahead Augmentation and Parallel Self-Refinement for Time Series Forecasting [10.615433089293228]
Back to the Future is a simple yet effective framework that enhances forecasting stability through look-ahead augmentation and self-corrective refinement.<n>Despite its simplicity, our approach consistently improves long-horizon accuracy and mitigates the instability of linear forecasting models.<n>These results suggest that leveraging model-generated forecasts as augmentation can be a simple yet powerful way to enhance long-term prediction, even without complex architectures.
arXiv Detail & Related papers (2026-02-02T14:23:31Z) - SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z) - Accurate typhoon intensity forecasts using a non-iterative spatiotemporal transformer model [3.468838035344738]
Accurate forecasting of tropical cyclone (TC) intensity remains a challenge for operational meteorology.<n>Recent advances in machine learning have yielded notable progress in TC prediction.<n>Here we introduceNet, a transformer-based forecasting model that generates non-iterative, 5-day intensity trajectories.
arXiv Detail & Related papers (2025-09-18T20:50:17Z) - Does Scaling Law Apply in Time Series Forecasting? [2.127584662240465]
We propose Alinear, an ultra-lightweight forecasting model that achieves competitive performance using only k-level parameters.<n>Experiments on seven benchmark datasets demonstrate that Alinear consistently outperforms large-scale models.<n>This work challenges the prevailing belief that larger models are inherently better and suggests a paradigm shift toward more efficient time series modeling.
arXiv Detail & Related papers (2025-05-15T11:04:39Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models.
Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML)
We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z) - ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast.
We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples.
Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z) - Does Vector Quantization Fail in Spatio-Temporal Forecasting? Exploring a Differentiable Sparse Soft-Vector Quantization Approach [22.070533429289334]
We present Sparse Soft- Quantization (SVQ), the first VQ method to enhance temporal forecasting.<n>SVQ balances with sparse noise reduction, offering fulliability and a solid foundation in sparse regression.<n>Our approach employs a two-layer dataset and an extensive codebook to streamline the sparse regression process.
arXiv Detail & Related papers (2023-12-06T10:42:40Z) - Adversarial Self-Attention for Language Understanding [89.265747130584]
This paper proposes textitAdversarial Self-Attention mechanism (ASA).
ASA adversarially reconstructs the Transformer attentions and facilitates model training from contaminated model structures.
For fine-tuning, ASA-empowered models consistently outweigh naive models by a large margin considering both generalization and robustness.
arXiv Detail & Related papers (2022-06-25T09:18:10Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.