Forecasting Cryptocurrency Returns from Sentiment Signals: An Analysis
of BERT Classifiers and Weak Supervision
- URL: http://arxiv.org/abs/2204.05781v3
- Date: Sun, 19 Mar 2023 21:32:59 GMT
- Title: Forecasting Cryptocurrency Returns from Sentiment Signals: An Analysis
of BERT Classifiers and Weak Supervision
- Authors: Duygu Ider, Stefan Lessmann
- Abstract summary: We introduce weak learning, a recently proposed NLP approach to address the problem that text data is unlabeled.
We confirm that finetuning using weak labels enhances the predictive value of text-based features and raises forecast accuracy in the context of predicting cryptocurrency returns.
More fundamentally, the modeling paradigm we present, weak labeling domain-specific text and finetuning pretrained NLP models, is universally applicable in (financial) forecasting.
- Score: 6.624726878647541
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Anticipating price developments in financial markets is a topic of continued
interest in forecasting. Funneled by advancements in deep learning and natural
language processing (NLP) together with the availability of vast amounts of
textual data in form of news articles, social media postings, etc., an
increasing number of studies incorporate text-based predictors in forecasting
models. We contribute to this literature by introducing weak learning, a
recently proposed NLP approach to address the problem that text data is
unlabeled. Without a dependent variable, it is not possible to finetune
pretrained NLP models on a custom corpus. We confirm that finetuning using weak
labels enhances the predictive value of text-based features and raises forecast
accuracy in the context of predicting cryptocurrency returns. More
fundamentally, the modeling paradigm we present, weak labeling domain-specific
text and finetuning pretrained NLP models, is universally applicable in
(financial) forecasting and unlocks new ways to leverage text data.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Natural Language Processing and Multimodal Stock Price Prediction [0.8702432681310401]
This paper utilizes stock percentage change as training data, in contrast to the traditional use of raw currency values.
The choice of percentage change aims to provide models with context regarding the significance of price fluctuations.
The study employs specialized BERT natural language processing models to predict stock price trends.
arXiv Detail & Related papers (2024-01-03T01:21:30Z) - Corporate Bankruptcy Prediction with Domain-Adapted BERT [7.931904787652709]
This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies.
We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.
arXiv Detail & Related papers (2023-12-06T00:05:25Z) - Measuring Consistency in Text-based Financial Forecasting Models [10.339586273664725]
FinTrust is an evaluation tool that assesses logical consistency in financial text.
We show that the consistency of state-of-the-art NLP models for financial forecasting is poor.
Our analysis of the performance degradation caused by meaning-preserving alternations suggests that current text-based methods are not suitable for robustly predicting market information.
arXiv Detail & Related papers (2023-05-15T10:32:26Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task
Financial Forecasting [17.691653056521904]
This paper describes a numeric-oriented hierarchical transformer model to predict stock returns and financial risk using multi-modal aligned earnings calls data.
We present the results of a comprehensive evaluation of Num HTML against several state-of-the-art baselines using a real-world publicly available dataset.
arXiv Detail & Related papers (2022-01-05T10:17:02Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - A Stochastic Time Series Model for Predicting Financial Trends using NLP [4.081440927534578]
Recent advancements in deep neural network technology allow researchers to develop highly accurate models to predict financial trends.
We propose a novel deep learning model called ST-GAN, or Time-series Generative Adversarial Network.
We utilize cutting-edge technology like the Generative Adversarial Network (GAN) to learn the correlations among textual and numerical data over time.
arXiv Detail & Related papers (2021-02-02T04:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.