Distributional Correlation--Aware Knowledge Distillation for Stock
Trading Volume Prediction
- URL: http://arxiv.org/abs/2208.07232v1
- Date: Thu, 4 Aug 2022 11:12:23 GMT
- Title: Distributional Correlation--Aware Knowledge Distillation for Stock
Trading Volume Prediction
- Authors: Lei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun
- Abstract summary: We present a novel framework for training a light-weight student model to perform trading volume prediction.
Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution.
We evaluate the framework on a real-world stock volume dataset with two different time window settings.
- Score: 27.91596305213188
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional knowledge distillation in classification problems transfers the
knowledge via class correlations in the soft label produced by teacher models,
which are not available in regression problems like stock trading volume
prediction. To remedy this, we present a novel distillation framework for
training a light-weight student model to perform trading volume prediction
given historical transaction data. Specifically, we turn the regression model
into a probabilistic forecasting model, by training models to predict a
Gaussian distribution to which the trading volume belongs. The student model
can thus learn from the teacher at a more informative distributional level, by
matching its predicted distributions to that of the teacher. Two correlational
distillation objectives are further introduced to encourage the student to
produce consistent pair-wise relationships with the teacher model. We evaluate
the framework on a real-world stock volume dataset with two different time
window settings. Experiments demonstrate that our framework is superior to
strong baseline models, compressing the model size by $5\times$ while
maintaining $99.6\%$ prediction accuracy. The extensive analysis further
reveals that our framework is more effective than vanilla distillation methods
under low-resource scenarios.
Related papers
- Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification.
We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models [7.062887337934677]
We propose that small models may not need to absorb the cost of pre-training to reap its benefits.
We observe that, when distilled on a task from a pre-trained model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task.
arXiv Detail & Related papers (2024-04-04T07:38:11Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - A Time Series Analysis-Based Stock Price Prediction Using Machine
Learning and Deep Learning Models [0.0]
We present a very robust and accurate framework of stock price prediction that consists of an agglomeration of statistical, machine learning and deep learning models.
We use the daily stock price data, collected at five minutes interval of time, of a very well known company that is listed in the National Stock Exchange (NSE) of India.
We contend that the agglomerative approach of model building that uses a combination of statistical, machine learning, and deep learning approaches, can very effectively learn from the volatile and random movement patterns in a stock price data.
arXiv Detail & Related papers (2020-04-17T19:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.