Astock: A New Dataset and Automated Stock Trading based on
Stock-specific News Analyzing Model
- URL: http://arxiv.org/abs/2206.06606v1
- Date: Tue, 14 Jun 2022 05:55:23 GMT
- Title: Astock: A New Dataset and Automated Stock Trading based on
Stock-specific News Analyzing Model
- Authors: Jinan Zou, Haiyao Cao, Lingqiao Liu, Yuhao Lin, Ehsan Abbasnejad,
Javen Qinfeng Shi
- Abstract summary: We build a platform to study the NLP-aided stock auto-trading algorithms systematically.
We provide financial news for each specific stock.
We provide various stock factors for each stock.
We evaluate performance from more financial-relevant metrics.
- Score: 21.05128751957895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing(NLP) demonstrates a great potential to support
financial decision-making by analyzing the text from social media or news
outlets. In this work, we build a platform to study the NLP-aided stock
auto-trading algorithms systematically. In contrast to the previous work, our
platform is characterized by three features: (1) We provide financial news for
each specific stock. (2) We provide various stock factors for each stock. (3)
We evaluate performance from more financial-relevant metrics. Such a design
allows us to develop and evaluate NLP-aided stock auto-trading algorithms in a
more realistic setting. In addition to designing an evaluation platform and
dataset collection, we also made a technical contribution by proposing a system
to automatically learn a good feature representation from various input
information. The key to our algorithm is a method called semantic role labeling
Pooling (SRLP), which leverages Semantic Role Labeling (SRL) to create a
compact representation of each news paragraph. Based on SRLP, we further
incorporate other stock factors to make the final prediction. In addition, we
propose a self-supervised learning strategy based on SRLP to enhance the
out-of-distribution generalization performance of our system. Through our
experimental study, we show that the proposed method achieves better
performance and outperforms all the baselines' annualized rate of return as
well as the maximum drawdown of the CSI300 index and XIN9 index on real
trading. Our Astock dataset and code are available at
https://github.com/JinanZou/Astock.
Related papers
- GraphCNNpred: A stock market indices prediction using a Graph based deep learning system [0.0]
We give a graph neural network based convolutional neural network (CNN) model, that can be applied on diverse source of data, in the attempt to extract features to predict the trends of indices of textS&textP 500, NASDAQ, DJI, NYSE, and RUSSEL.
Experiments show that the associated models improve the performance of prediction in all indices over the baseline algorithms by about $4% text to 15%$, in terms of F-measure.
arXiv Detail & Related papers (2024-07-04T09:14:24Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Optimizing Portfolio Management and Risk Assessment in Digital Assets
Using Deep Learning for Predictive Analysis [5.015409508372732]
This paper introduces the DQN algorithm into asset management portfolios in a novel and straightforward way.
The performance greatly exceeds the benchmark, which fully proves the effectiveness of the DRL algorithm in portfolio management.
Since different assets are trained separately as environments, there may be a phenomenon of Q value drift among different assets.
arXiv Detail & Related papers (2024-02-25T05:23:57Z) - FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks.
FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Forecasting Cryptocurrency Prices Using Deep Learning: Integrating
Financial, Blockchain, and Text Data [3.8443430569753025]
We analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods.
We compare the performance of various ML models, both with and without NLP data integration.
We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment.
arXiv Detail & Related papers (2023-11-23T16:14:44Z) - Integrating Stock Features and Global Information via Large Language
Models for Enhanced Stock Return Prediction [5.762650600435391]
We propose a novel framework consisting of two components to surmount the challenges of integrating Large Language Models with existing quantitative models.
We have demonstrated superior performance in Rank Information Coefficient and returns, particularly compared to models relying only on stock features in the China A-share market.
arXiv Detail & Related papers (2023-10-09T11:34:18Z) - Compatible deep neural network framework with financial time series
data, including data preprocessor, neural network model and trading strategy [2.347843817145202]
This research introduces a new deep neural network architecture and a novel idea of how to prepare financial data before feeding them to the model.
Three different datasets are used to evaluate this method, where results indicate that this framework can provide us with profitable and robust predictions.
arXiv Detail & Related papers (2022-05-11T20:44:08Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.