The future is different: Large pre-trained language models fail in
prediction tasks
- URL: http://arxiv.org/abs/2211.00384v2
- Date: Wed, 2 Nov 2022 21:42:44 GMT
- Title: The future is different: Large pre-trained language models fail in
prediction tasks
- Authors: Kostadin Cvejoski, Rams\'es J. S\'anchez, C\'esar Ojeda
- Abstract summary: We introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits.
First, we empirically demonstrate that LPLM can display average performance drops of about 88% when predicting the popularity of future posts from sub-reddits whose topic distribution changes with time.
We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks.
- Score: 2.9005223064604078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained language models (LPLM) have shown spectacular success when
fine-tuned on downstream supervised tasks. Yet, it is known that their
performance can drastically drop when there is a distribution shift between the
data used during training and that used at inference time. In this paper we
focus on data distributions that naturally change over time and introduce four
new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and
POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display
average performance drops of about 88% (in the best case!) when predicting the
popularity of future posts from sub-reddits whose topic distribution changes
with time. We then introduce a simple methodology that leverages neural
variational dynamic topic models and attention mechanisms to infer temporal
language model representations for regression tasks. Our models display
performance drops of only about 40% in the worst cases (2% in the best ones)
when predicting the popularity of future posts, while using only about 7% of
the total number of parameters of LPLM and providing interpretable
representations that offer insight into real-world events, like the GameStop
short squeeze of 2021
Related papers
- Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models.
We validate this approach using four standard NLP benchmarks.
We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z) - Zero-Shot Load Forecasting with Large Language Models [40.604618284659736]
This paper proposes a zero-shot load forecasting approach using an advanced LLM framework denoted as the Chronos model.
By utilizing its extensive pre-trained knowledge, the Chronos model enables accurate load forecasting in data-scarce scenarios without the need for extensive data-specific training.
arXiv Detail & Related papers (2024-11-18T07:39:46Z) - Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world.
This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - Correlated Time Series Self-Supervised Representation Learning via
Spatiotemporal Bootstrapping [13.988624652592259]
Time series analysis plays an important role in many real-world industries.
In this paper, we propose a time-step-level representation learning framework for individual instances.
A linear regression model trained on top of the learned representations demonstrates our model performs best in most cases.
arXiv Detail & Related papers (2023-06-12T09:42:16Z) - Learn What Is Possible, Then Choose What Is Best: Disentangling
One-To-Many Relations in Language Through Text-based Games [3.615981646205045]
We present an approach to train language models that can emulate the desirable behaviours, but not the undesirable ones.
Using text-based games as a testbed, our approach, PASA, uses discrete latent variables to capture the range of different behaviours.
Results show up to 49% empirical improvement over the previous state-of-the-art model.
arXiv Detail & Related papers (2023-04-14T17:11:26Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Zero-shot meta-learning for small-scale data from human subjects [10.320654885121346]
We develop a framework to rapidly adapt to a new prediction task with limited training data for out-of-sample test data.
Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions.
Our model has implications for improved generalization of small-size human studies to the wider population.
arXiv Detail & Related papers (2022-03-29T17:42:04Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.