TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series
Forecasting
- URL: http://arxiv.org/abs/2306.09364v4
- Date: Mon, 11 Dec 2023 15:46:13 GMT
- Title: TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series
Forecasting
- Authors: Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant
Kalagnanam
- Abstract summary: Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions.
High memory and computing requirements pose a critical bottleneck for long-term forecasting.
We propose TSMixer, a lightweight neural architecture composed of multi-layer perceptron (MLP) modules.
- Score: 13.410217680999459
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformers have gained popularity in time series forecasting for their
ability to capture long-sequence interactions. However, their high memory and
computing requirements pose a critical bottleneck for long-term forecasting. To
address this, we propose TSMixer, a lightweight neural architecture exclusively
composed of multi-layer perceptron (MLP) modules for multivariate forecasting
and representation learning on patched time series. Inspired by MLP-Mixer's
success in computer vision, we adapt it for time series, addressing challenges
and introducing validated components for enhanced accuracy. This includes a
novel design paradigm of attaching online reconciliation heads to the MLP-Mixer
backbone, for explicitly modeling the time-series properties such as hierarchy
and channel-correlations. We also propose a novel Hybrid channel modeling and
infusion of a simple gating approach to effectively handle noisy channel
interactions and generalization across diverse datasets. By incorporating these
lightweight components, we significantly enhance the learning capability of
simple MLP structures, outperforming complex Transformer models with minimal
computing usage. Moreover, TSMixer's modular design enables compatibility with
both supervised and masked self-supervised learning methods, making it a
promising building block for time-series Foundation Models. TSMixer outperforms
state-of-the-art MLP and Transformer models in forecasting by a considerable
margin of 8-60%. It also outperforms the latest strong benchmarks of
Patch-Transformer models (by 1-2%) with a significant reduction in memory and
runtime (2-3X). The source code of our model is officially released as
PatchTSMixer in the HuggingFace. Model:
https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer
Examples: https://github.com/ibm/tsfm/#notebooks-links
Related papers
- The Mamba in the Llama: Distilling and Accelerating Hybrid Models [76.64055251296548]
We show that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources.
The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks.
arXiv Detail & Related papers (2024-08-27T17:56:11Z) - Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking [6.9366619419210656]
Transformers have established themselves as the leading neural network model in natural language processing.
Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers.
This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
arXiv Detail & Related papers (2024-06-18T02:42:19Z) - SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks.
Much less research has been devoted to the channel mixer or feature mixing block (FFN or)
We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z) - MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer [2.5782420501870296]
We generalize studies on Hopfield networks and Transformer-like architecture to iMixer.
iMixer is a generalization that propagates forward from the output side to the input side.
We evaluate the model performance with various datasets on image classification tasks.
The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
arXiv Detail & Related papers (2023-04-25T18:00:08Z) - HyperMixer: An MLP-based Low Cost Alternative to Transformers [12.785548869229052]
We propose a simple variant, HyperMixer, which forms the token mixing dynamically using hypernetworks.
In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyper tuning.
arXiv Detail & Related papers (2022-03-07T20:23:46Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z) - Wake Word Detection with Streaming Transformers [72.66551640048405]
We show that our proposed Transformer model outperforms the baseline convolution network by 25% on average in false rejection rate at the same false alarm rate.
Our experiments on the Mobvoi wake word dataset demonstrate that our proposed Transformer model outperforms the baseline convolution network by 25%.
arXiv Detail & Related papers (2021-02-08T19:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.