Related papers: TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

URL: http://arxiv.org/abs/2306.09364v4
Date: Mon, 11 Dec 2023 15:46:13 GMT
Title: TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting
Authors: Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
Abstract summary: Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. High memory and computing requirements pose a critical bottleneck for long-term forecasting. We propose TSMixer, a lightweight neural architecture composed of multi-layer perceptron (MLP) modules.
Score: 13.410217680999459
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules for multivariate forecasting and representation learning on patched time series. Inspired by MLP-Mixer's success in computer vision, we adapt it for time series, addressing challenges and introducing validated components for enhanced accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a novel Hybrid channel modeling and infusion of a simple gating approach to effectively handle noisy channel interactions and generalization across diverse datasets. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X). The source code of our model is officially released as PatchTSMixer in the HuggingFace. Model: https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer Examples: https://github.com/ibm/tsfm/#notebooks-links

Related papers

WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting [20.29110166475336]
We propose Wavelet Patch Patch Mixer (WPMixer), a novel multi-resolution wavelet decomposition model for time series forecasting. Our model significantly outperforms state-of-the-art-based and transformer-based models for long-term time series forecasting.
arXiv Detail & Related papers (2024-12-22T22:08:16Z)
The Mamba in the Llama: Distilling and Accelerating Hybrid Models [76.64055251296548]
We show that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks.
arXiv Detail & Related papers (2024-08-27T17:56:11Z)
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs) Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens. Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z)
Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking [6.9366619419210656]
Transformers have established themselves as the leading neural network model in natural language processing. Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
arXiv Detail & Related papers (2024-06-18T02:42:19Z)
SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints. We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B. We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z)
iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer [2.5782420501870296]
We generalize studies on Hopfield networks and Transformer-like architecture to iMixer. iMixer is a generalization that propagates forward from the output side to the input side. We evaluate the model performance with various datasets on image classification tasks. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
arXiv Detail & Related papers (2023-04-25T18:00:08Z)
HyperMixer: An MLP-based Low Cost Alternative to Transformers [12.785548869229052]
We propose a simple variant, HyperMixer, which forms the token mixing dynamically using hypernetworks. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyper tuning.
arXiv Detail & Related papers (2022-03-07T20:23:46Z)
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
Wake Word Detection with Streaming Transformers [72.66551640048405]
We show that our proposed Transformer model outperforms the baseline convolution network by 25% on average in false rejection rate at the same false alarm rate. Our experiments on the Mobvoi wake word dataset demonstrate that our proposed Transformer model outperforms the baseline convolution network by 25%.
arXiv Detail & Related papers (2021-02-08T19:14:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.