Text Mining of Stocktwits Data for Predicting Stock Prices
- URL: http://arxiv.org/abs/2103.16388v1
- Date: Sat, 13 Mar 2021 03:29:14 GMT
- Title: Text Mining of Stocktwits Data for Predicting Stock Prices
- Authors: Mukul Jaggi, Priyanka Mandal, Shreya Narang, Usman Naseem and Matloob
Khushi
- Abstract summary: FinALBERT is trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change.
We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google)
- Score: 3.3554367023486935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stock price prediction can be made more efficient by considering the price
fluctuations and understanding the sentiments of people. A limited number of
models understand financial jargon or have labelled datasets concerning stock
price change. To overcome this challenge, we introduced FinALBERT, an ALBERT
based model trained to handle financial domain text classification tasks by
labelling Stocktwits text data based on stock price change. We collected
Stocktwits data for over ten years for 25 different companies, including the
major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets
were labelled with three labelling techniques based on stock price changes. Our
proposed model FinALBERT is fine-tuned with these labels to achieve optimal
results. We experimented with the labelled dataset by training it on
traditional machine learning, BERT, and FinBERT models, which helped us
understand how these labels behaved with different model architectures. Our
labelling method competitive advantage is that it can help analyse the
historical data effectively, and the mathematical function can be easily
customised to predict stock movement.
Related papers
- Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models [0.0]
We employ Large Language Models (LLMs) to predict market movements.
Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies.
By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1%.
arXiv Detail & Related papers (2024-11-02T21:53:20Z) - A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers [70.20477771578824]
Existing approaches to event prediction include time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance.
We propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective.
Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases.
arXiv Detail & Related papers (2024-10-14T15:59:16Z) - Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation [0.0]
This study proposes leveraging the generative capabilities of large language models (LLMs) to create synthetic, domain-specific training data.
The research specifically aims to enhance FinBERT, a BERT model fine-tuned for financial sentiment analysis, and develop TinyFinBERT, a compact transformer model.
arXiv Detail & Related papers (2024-09-19T10:22:23Z) - StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction [13.52020491768311]
We introduce StockTime, a novel LLM-based architecture designed specifically for stock price time series data.
Unlike recent FinLLMs, StockTime is specifically designed for stock price time series data.
By fusing this multimodal data, StockTime effectively predicts stock prices across arbitrary look-back periods.
arXiv Detail & Related papers (2024-08-25T00:50:33Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Improving CNN-base Stock Trading By Considering Data Heterogeneity and
Burst [1.6637373649145604]
We propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data.
We then develop novel normalization process to prepare the stock data.
Experiment results show that our approach can outperform other comparing approaches.
arXiv Detail & Related papers (2023-03-14T01:05:17Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - A data-science-driven short-term analysis of Amazon, Apple, Google, and
Microsoft stocks [0.43012765978447565]
We implement a combination of technical analysis and machine/deep learning-based analysis to build a trend classification model.
We execute a data-science-driven technique that makes short-term forecasts dependent on the price trends of current stock market data.
arXiv Detail & Related papers (2021-07-30T15:19:52Z) - Evaluating data augmentation for financial time series classification [85.38479579398525]
We evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models.
For a relatively small dataset augmentation methods achieve up to $400%$ improvement in risk adjusted return performance.
For a larger stock dataset augmentation methods achieve up to $40%$ improvement.
arXiv Detail & Related papers (2020-10-28T17:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.