Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis
- URL: http://arxiv.org/abs/2305.07972v1
- Date: Sat, 13 May 2023 17:32:39 GMT
- Title: Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis
- Authors: Agam Shah and Suvan Paturi and Sudheer Chava
- Abstract summary: We construct the largest tokenized and annotated dataset of Federal Open Market Committee (FOMC) speeches, meeting minutes, and press conference transcripts.
Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the document release days.
Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.
- Score: 1.933681537640272
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a
major driver of financial market returns. We construct the largest tokenized
and annotated dataset of FOMC speeches, meeting minutes, and press conference
transcripts in order to understand how monetary policy influences financial
markets. In this study, we develop a novel task of hawkish-dovish
classification and benchmark various pre-trained language models on the
proposed dataset. Using the best-performing model (RoBERTa-large), we construct
a measure of monetary policy stance for the FOMC document release days. To
evaluate the constructed measure, we study its impact on the treasury market,
stock market, and macroeconomic indicators. Our dataset, models, and code are
publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.
Related papers
- Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models [0.0]
We employ Large Language Models (LLMs) to predict market movements.
Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies.
By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1%.
arXiv Detail & Related papers (2024-11-02T21:53:20Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Effect of Leaders Voice on Financial Market: An Empirical Deep Learning Expedition on NASDAQ, NSE, and Beyond [1.6622844933418388]
Deep learning based models are proposed to predict the trend of financial market based on NLP analysis of the twitter handles of leaders of different fields.
The Indian and USA financial markets are explored in the present work where as other markets can be taken in future.
arXiv Detail & Related papers (2024-03-18T18:19:08Z) - FMPAF: How Do Fed Chairs Affect the Financial Market? A Fine-grained
Monetary Policy Analysis Framework on Their Language [3.760301720305374]
We propose the Fine-Grained Monetary Policy Analysis Framework (FMPAF), a novel approach that integrates large language models (LLMs) with regression analysis.
Based on our preferred specification, a one-unit increase in the sentiment score is associated with an increase of the price of S&P 500 Exchange-Traded Fund.
arXiv Detail & Related papers (2024-03-10T07:21:31Z) - Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis [4.575870619860645]
We construct a new financial dataset for the claim detection task in the financial domain.
We propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function.
Here, we observe the dependence of earnings surprise and return on our optimism measure.
arXiv Detail & Related papers (2024-02-18T22:55:26Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - Dynamic Datasets and Market Environments for Financial Reinforcement
Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments.
We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies.
We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - BloombergGPT: A Large Language Model for Finance [42.73350054822628]
We present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data.
We construct a 363 billion token dataset based on Bloomberg's extensive data sources, augmented with 345 billion tokens from general purpose datasets.
Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins.
arXiv Detail & Related papers (2023-03-30T17:30:36Z) - Gaussian process imputation of multiple financial series [71.08576457371433]
Multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market.
We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process.
arXiv Detail & Related papers (2020-02-11T19:18:18Z) - Reinforcement-Learning based Portfolio Management with Augmented Asset
Movement Prediction States [71.54651874063865]
Portfolio management (PM) aims to achieve investment goals such as maximal profits or minimal risks.
In this paper, we propose SARL, a novel State-Augmented RL framework for PM.
Our framework aims to address two unique challenges in financial PM: (1) data Heterogeneous data -- the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary.
arXiv Detail & Related papers (2020-02-09T08:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.