Parameter Efficient Deep Probabilistic Forecasting
- URL: http://arxiv.org/abs/2112.02905v1
- Date: Mon, 6 Dec 2021 10:09:39 GMT
- Title: Parameter Efficient Deep Probabilistic Forecasting
- Authors: Olivier Sprangers Sebastian Schelter Maarten de Rijke
- Abstract summary: We introduce a novel Bidirectional Temporal Convolutional Network (BiTCN), which requires an order of magnitude less parameters than a common Transformer-based approach.
Our method performs on par with four state-of-the-art probabilistic forecasting methods, including a Transformer-based approach and WaveNet.
We demonstrate that our method requires significantly less parameters than Transformer-based methods, which means the model can be trained faster with significantly lower memory requirements.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Probabilistic time series forecasting is crucial in many application domains
such as retail, ecommerce, finance, or biology. With the increasing
availability of large volumes of data, a number of neural architectures have
been proposed for this problem. In particular, Transformer-based methods
achieve state-of-the-art performance on real-world benchmarks. However, these
methods require a large number of parameters to be learned, which imposes high
memory requirements on the computational resources for training such models.
To address this problem, we introduce a novel Bidirectional Temporal
Convolutional Network (BiTCN), which requires an order of magnitude less
parameters than a common Transformer-based approach. Our model combines two
Temporal Convolutional Networks (TCNs): the first network encodes future
covariates of the time series, whereas the second network encodes past
observations and covariates. We jointly estimate the parameters of an output
distribution via these two networks.
Experiments on four real-world datasets show that our method performs on par
with four state-of-the-art probabilistic forecasting methods, including a
Transformer-based approach and WaveNet, on two point metrics (sMAPE, NRMSE) as
well as on a set of range metrics (quantile loss percentiles) in the majority
of cases. Secondly, we demonstrate that our method requires significantly less
parameters than Transformer-based methods, which means the model can be trained
faster with significantly lower memory requirements, which as a consequence
reduces the infrastructure cost for deploying these models.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Network insensitivity to parameter noise via adversarial regularization [0.0]
We present a new adversarial network optimisation algorithm that attacks network parameters during training.
We show that our approach produces models that are more robust to targeted parameter variation.
Our work provides an approach to deploy neural network architectures to inference devices that suffer from computational non-idealities.
arXiv Detail & Related papers (2021-06-09T12:11:55Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - A Hybrid Objective Function for Robustness of Artificial Neural Networks
-- Estimation of Parameters in a Mechanical System [0.0]
We consider the task of estimating parameters of a mechanical vehicle model based on acceleration profiles.
We introduce a convolutional neural network architecture that is capable to predict the parameters for a family of vehicle models that differ in the unknown parameters.
arXiv Detail & Related papers (2020-04-16T15:06:43Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.