Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services
- URL: http://arxiv.org/abs/2507.12908v1
- Date: Thu, 17 Jul 2025 08:51:28 GMT
- Title: Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services
- Authors: Jiadong Chen, Hengyu Ye, Fuxin Jiang, Xiao He, Tieying Zhang, Jianjun Chen, Xiaofeng Gao,
- Abstract summary: We propose Fremer, an efficient and effective deep forecasting model.<n>Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models.<n>It achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting.
- Score: 9.687789919349523
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Workload forecasting is pivotal in cloud service applications, such as auto-scaling and scheduling, with profound implications for operational efficiency. Although Transformer-based forecasting models have demonstrated remarkable success in general tasks, their computational efficiency often falls short of the stringent requirements in large-scale cloud environments. Given that most workload series exhibit complicated periodic patterns, addressing these challenges in the frequency domain offers substantial advantages. To this end, we propose Fremer, an efficient and effective deep forecasting model. Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models; it achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting; and it exhibits robust performance for multi-period series. Furthermore, we collect and open-source four high-quality, open-source workload datasets derived from ByteDance's cloud services, encompassing workload data from thousands of computing instances. Extensive experiments on both our proprietary datasets and public benchmarks demonstrate that Fremer consistently outperforms baseline models, achieving average improvements of 5.5% in MSE, 4.7% in MAE, and 8.6% in SMAPE over SOTA models, while simultaneously reducing parameter scale and computational costs. Additionally, in a proactive auto-scaling test based on Kubernetes, Fremer improves average latency by 18.78% and reduces resource consumption by 2.35%, underscoring its practical efficacy in real-world applications.
Related papers
- SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders [21.80413275965637]
SORT (Systematically Optimized Ranking Transformer) is a scalable model designed to bridge the gap between Transformers and industrial-scale ranking models.<n>We address the high feature sparsity and low label density challenges through a series of optimizations.<n>SORT exhibits excellent scalability across data size, model size and sequence length, while remaining flexible at integrating diverse features.
arXiv Detail & Related papers (2026-03-04T12:32:43Z) - Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z) - Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management [17.903318666906728]
Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems.<n>Forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention.<n>We propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices.
arXiv Detail & Related papers (2025-08-19T15:06:04Z) - Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - Bayesian Optimization of a Lightweight and Accurate Neural Network for Aerodynamic Performance Prediction [0.0]
We propose a new approach to build efficient and accurate predictive models for aerodynamic performance prediction.<n>To clearly describe the interplay between design variables, hierarchical and categorical kernels are used in the BO formulation.<n>For the drag coefficient prediction task, the Mean Absolute Percentage Error (MAPE) of our optimized model drops from 0.1433% to 0.0163%.<n>Our model achieves a MAPE of 0.82% on a benchmark aircraft self-noise prediction problem, significantly outperforming existing models.
arXiv Detail & Related papers (2025-03-25T09:14:36Z) - Gridded Transformer Neural Processes for Large Unstructured Spatio-Temporal Data [47.14384085714576]
We introduce gridded pseudo-tokenPs to handle unstructured observations and a processor containing gridded pseudo-tokens that leverage efficient attention mechanisms.
Our method consistently outperforms a range of strong baselines on various synthetic and real-world regression tasks involving large-scale data.
The real-life experiments are performed on weather data, demonstrating the potential of our approach to bring performance and computational benefits when applied at scale in a weather modelling pipeline.
arXiv Detail & Related papers (2024-10-09T10:00:56Z) - An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation [0.799543372823325]
We propose an Augmentation-based Model Re-adaptation Framework (AMRF) to enhance the generalisation of segmentation models.
By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance.
Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets.
arXiv Detail & Related papers (2024-09-14T21:01:49Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Crafting Efficient Fine-Tuning Strategies for Large Language Models [2.633490094119608]
Fine-tuning large language models (LLMs) with as few as 200 samples can improve model accuracy from 70% to 88% in a product attribute extraction task.
A bayesian hyperparameter optimization method, which evaluates models at 20% of total training time, correlates strongly with final model performance.
This approach led to a 2% improvement in accuracy over baseline models when evaluated on an independent test set.
arXiv Detail & Related papers (2024-07-18T21:36:00Z) - Transformer Multivariate Forecasting: Less is More? [42.558736426375056]
The paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency.
The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets.
From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average.
arXiv Detail & Related papers (2023-12-30T13:44:23Z) - PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload [11.93843096959306]
workload of cloud servers is highly variable, with occasional heavy workload bursts.
There are two categories of workload prediction methods: statistical methods and neural-network-based ones.
We propose PePNet to improve overall especially heavy workload prediction accuracy.
arXiv Detail & Related papers (2023-07-11T07:56:27Z) - Quaternion Factorization Machines: A Lightweight Solution to Intricate
Feature Interaction Modelling [76.89779231460193]
factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering.
We propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM) for sparse predictive analytics.
arXiv Detail & Related papers (2021-04-05T00:02:36Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.