Related papers: Text Mining of Stocktwits Data for Predicting Stock Prices

Text Mining of Stocktwits Data for Predicting Stock Prices

URL: http://arxiv.org/abs/2103.16388v1
Date: Sat, 13 Mar 2021 03:29:14 GMT
Title: Text Mining of Stocktwits Data for Predicting Stock Prices
Authors: Mukul Jaggi, Priyanka Mandal, Shreya Narang, Usman Naseem and Matloob Khushi
Abstract summary: FinALBERT is trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google)
Score: 3.3554367023486935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stock price prediction can be made more efficient by considering the price fluctuations and understanding the sentiments of people. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement.

Related papers

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market [0.0]
This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and news flow data. A unique dataset was collected, which includes time series for 176 Russian stocks traded on the Moscow Exchange and 79,555 financial news articles in Russian. Experiments showed that incorporating textual modality reduced the MAPE value by 55%.
arXiv Detail & Related papers (2025-03-05T21:20:32Z)
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making. FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z)
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data [10.684577067675585]
Price Trend Prediction (PTP) based on Limit Order Book (LOB) data is a fundamental challenge in financial markets.<n>We propose TLOB, a transformer-based model that uses a dual attention mechanism to capture spatial and temporal dependencies in LOB data.<n>We empirically show how stock price predictability has declined over time, -6.68 in F1-score, highlighting the growing market efficiency.
arXiv Detail & Related papers (2025-02-12T12:41:10Z)
An Instrumental Value for Data Production and its Application to Data Pricing [107.98697414652479]
This paper develops an approach for capturing the instrumental value of data production processes. We show how they connect to classic notions of information design and signals in information economics.
arXiv Detail & Related papers (2024-12-24T03:53:57Z)
Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models [0.0]
We employ Large Language Models (LLMs) to predict market movements. Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies. By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1%.
arXiv Detail & Related papers (2024-11-02T21:53:20Z)
A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers [70.20477771578824]
Existing approaches to event prediction include time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. We propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases.
arXiv Detail & Related papers (2024-10-14T15:59:16Z)
Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation [0.0]
This study proposes leveraging the generative capabilities of large language models (LLMs) to create synthetic, domain-specific training data. The research specifically aims to enhance FinBERT, a BERT model fine-tuned for financial sentiment analysis, and develop TinyFinBERT, a compact transformer model.
arXiv Detail & Related papers (2024-09-19T10:22:23Z)
StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction [13.52020491768311]
We introduce StockTime, a novel LLM-based architecture designed specifically for stock price time series data. Unlike recent FinLLMs, StockTime is specifically designed for stock price time series data. By fusing this multimodal data, StockTime effectively predicts stock prices across arbitrary look-back periods.
arXiv Detail & Related papers (2024-08-25T00:50:33Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Improving CNN-base Stock Trading By Considering Data Heterogeneity and Burst [1.6637373649145604]
We propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data. We then develop novel normalization process to prepare the stock data. Experiment results show that our approach can outperform other comparing approaches.
arXiv Detail & Related papers (2023-03-14T01:05:17Z)
Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities. In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed. This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z)
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL) CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z)
A data-science-driven short-term analysis of Amazon, Apple, Google, and Microsoft stocks [0.43012765978447565]
We implement a combination of technical analysis and machine/deep learning-based analysis to build a trend classification model. We execute a data-science-driven technique that makes short-term forecasts dependent on the price trends of current stock market data.
arXiv Detail & Related papers (2021-07-30T15:19:52Z)
Evaluating data augmentation for financial time series classification [85.38479579398525]
We evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. For a relatively small dataset augmentation methods achieve up to $400%$ improvement in risk adjusted return performance. For a larger stock dataset augmentation methods achieve up to $40%$ improvement.
arXiv Detail & Related papers (2020-10-28T17:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.