HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting
- URL: http://arxiv.org/abs/2602.16468v2
- Date: Thu, 19 Feb 2026 12:57:13 GMT
- Title: HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting
- Authors: Jung Min Choi, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme,
- Abstract summary: We propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner.<n>By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework.
- Score: 10.068780251829606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In long-term multivariate time series forecasting, effectively capturing both periodic patterns and residual dynamics is essential. To address this within standard deep learning benchmark settings, we propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner. The periodic component utilizes a learnable cycle module [7] enhanced with a nonlinear channel-wise MLP for greater expressiveness. The residual component is processed through a Learnable Stationary Wavelet Transform (LSWT) to extract stable, shift-invariant frequency-domain representations. Subsequently, a channel-mixing encoder models explicit inter-channel dependencies, while a two-level non-overlapping hierarchical patching mechanism captures coarse- and fine-scale residual variations. By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework. Extensive experiments on standard multivariate benchmarks demonstrate that HPMixer achieves competitive or state-of-the-art performance compared to recent baselines.
Related papers
- DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z) - MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts [0.8292000624465587]
Real-world time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes.<n>MoHETS integrates sparse Mixture-of-Heterogeneous-Experts layers.<n>We replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder.
arXiv Detail & Related papers (2026-01-29T15:35:26Z) - Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment [92.57576987521107]
We propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE)<n>QCMoE allows continuous and consistent rate control with appealing R-D performance.<n> Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts.
arXiv Detail & Related papers (2025-12-11T09:14:51Z) - DPWMixer: Dual-Path Wavelet Mixer for Long-Term Time Series Forecasting [6.01829429039985]
Long-term time series forecasting is a critical task in computational intelligence.<n>This paper proposes DPWMixer, a computationally efficient Dual-Path architecture.<n>Experiments on eight public benchmarks demonstrate that our method achieves a consistent improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-30T03:12:50Z) - AWEMixer: Adaptive Wavelet-Enhanced Mixer Network for Long-Term Time Series Forecasting [12.450099337354017]
We propose AWEMixer, an Adaptive Wavelet-Enhanced Mixer Network.<n>A Frequency Router designs to utilize the global periodicity pattern achieved by Fast Fourier Transform to adaptively weight localized wavelet subband.<n>A Coherent Gated Fusion Block to achieve selective integration of prominent frequency features with multi-scale temporal representation.
arXiv Detail & Related papers (2025-11-06T11:27:12Z) - WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z) - Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification [25.27495694566081]
We propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme.<n>CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation.
arXiv Detail & Related papers (2024-12-17T14:12:20Z) - A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis [14.40202378972828]
We propose MSD-Mixer, a Multi-Scale Decomposition-Mixer, which learns to explicitly decompose and represent the input time series in its different layers.
We demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.
arXiv Detail & Related papers (2023-10-18T13:39:07Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.