AMLNet: Adversarial Mutual Learning Neural Network for
Non-AutoRegressive Multi-Horizon Time Series Forecasting
- URL: http://arxiv.org/abs/2310.19289v1
- Date: Mon, 30 Oct 2023 06:10:00 GMT
- Title: AMLNet: Adversarial Mutual Learning Neural Network for
Non-AutoRegressive Multi-Horizon Time Series Forecasting
- Authors: Yang Lin
- Abstract summary: We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation approach.
AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner.
This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation.
- Score: 4.911305944028228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-horizon time series forecasting, crucial across diverse domains,
demands high accuracy and speed. While AutoRegressive (AR) models excel in
short-term predictions, they suffer speed and error issues as the horizon
extends. Non-AutoRegressive (NAR) models suit long-term predictions but
struggle with interdependence, yielding unrealistic results. We introduce
AMLNet, an innovative NAR model that achieves realistic forecasts through an
online Knowledge Distillation (KD) approach. AMLNet harnesses the strengths of
both AR and NAR models by training a deep AR decoder and a deep NAR decoder in
a collaborative manner, serving as ensemble teachers that impart knowledge to a
shallower NAR decoder. This knowledge transfer is facilitated through two key
mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of
KD losses from the teacher models, enabling the shallow NAR decoder to
incorporate the ensemble's diversity; and 2) hint-driven KD, which employs
adversarial training to extract valuable insights from the model's hidden
states for distillation. Extensive experimentation showcases AMLNet's
superiority over conventional AR and NAR models, thereby presenting a promising
avenue for multi-horizon time series forecasting that enhances accuracy and
expedites computation.
Related papers
- Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - Distilling Autoregressive Models to Obtain High-Performance
Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference
Speed [8.184624214651283]
We propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency.
We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances.
arXiv Detail & Related papers (2023-12-19T07:13:32Z) - Progressive Neural Network for Multi-Horizon Time Series Forecasting [4.911305944028228]
ProNet is a novel deep learning approach designed for multi-horizon time series forecasting.
Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively.
In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation.
arXiv Detail & Related papers (2023-10-30T07:46:40Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators [35.939982651768666]
Probability framework of NAR models requires conditional independence assumption on target sequences.
We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals.
Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
arXiv Detail & Related papers (2022-11-11T09:10:14Z) - Probabilistic AutoRegressive Neural Networks for Accurate Long-range
Forecasting [6.295157260756792]
We introduce the Probabilistic AutoRegressive Neural Networks (PARNN)
PARNN is capable of handling complex time series data exhibiting non-stationarity, nonlinearity, non-seasonality, long-range dependence, and chaotic patterns.
We evaluate the performance of PARNN against standard statistical, machine learning, and deep learning models, including Transformers, NBeats, and DeepAR.
arXiv Detail & Related papers (2022-04-01T17:57:36Z) - A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation.
Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel.
This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z) - A Study of Non-autoregressive Model for Sequence Generation [147.89525760170923]
Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel.
We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
arXiv Detail & Related papers (2020-04-22T09:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.