Numerical Claim Detection in Finance: A New Financial Dataset,
Weak-Supervision Model, and Market Analysis
- URL: http://arxiv.org/abs/2402.11728v1
- Date: Sun, 18 Feb 2024 22:55:26 GMT
- Title: Numerical Claim Detection in Finance: A New Financial Dataset,
Weak-Supervision Model, and Market Analysis
- Authors: Agam Shah, Arnav Hiray, Pratvi Shah, Arkaprabha Banerjee, Anushka
Singh, Dheeraj Eidnani, Bhaskar Chaudhury, Sudheer Chava
- Abstract summary: We construct a new financial dataset for the claim detection task in the financial domain.
We propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function.
We demonstrate the practical utility of our proposed model by constructing a novel measure optimism"
- Score: 4.9524454709622585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the influence of claims in analyst reports and
earnings calls on financial market returns, considering them as significant
quarterly events for publicly traded companies. To facilitate a comprehensive
analysis, we construct a new financial dataset for the claim detection task in
the financial domain. We benchmark various language models on this dataset and
propose a novel weak-supervision model that incorporates the knowledge of
subject matter experts (SMEs) in the aggregation function, outperforming
existing approaches. Furthermore, we demonstrate the practical utility of our
proposed model by constructing a novel measure ``optimism". Furthermore, we
observed the dependence of earnings surprise and return on our optimism
measure. Our dataset, models, and code will be made publicly (under CC BY 4.0
license) available on GitHub and Hugging Face.
Related papers
- Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines [4.198715347024138]
We use Natural Language Processing (NLP) and Large Language Models (LLM) to analyze sentiment from the perspective of retail investors.
We fine-tune several models, including distilbert-base-uncased, Llama, and gemma-7b, to evaluate their effectiveness in sentiment classification.
Our experiments demonstrate that the fine-tuned gemma-7b model outperforms others, achieving the highest precision, recall, and F1 score.
arXiv Detail & Related papers (2024-06-19T15:20:19Z) - A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges [60.546677053091685]
Large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain.
We explore the application of LLMs on various financial tasks, focusing on their potential to transform traditional practices and drive innovation.
We highlight this survey for categorizing the existing literature into key application areas, including linguistic tasks, sentiment analysis, financial time series, financial reasoning, agent-based modeling, and other applications.
arXiv Detail & Related papers (2024-06-15T16:11:35Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance.
Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies.
We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z) - FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models [26.99936434072108]
textttFinDABench is a benchmark designed to evaluate the financial data analysis capabilities of Large Language Models.
textttFinDABench aims to provide a measure for in-depth analysis of LLM abilities.
arXiv Detail & Related papers (2024-01-01T15:26:23Z) - FinGPT: Instruction Tuning Benchmark for Open-Source Large Language
Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models.
We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration.
The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - FinEAS: Financial Embedding Analysis of Sentiment [0.0]
We introduce a new language representation model in finance called Financial Embedding Analysis of Sentiment (FinEAS)
In this work, we propose a new model for financial sentiment analysis based on supervised fine-tuned sentence embeddings from a standard BERT model.
arXiv Detail & Related papers (2021-10-31T15:41:56Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Gaussian process imputation of multiple financial series [71.08576457371433]
Multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market.
We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process.
arXiv Detail & Related papers (2020-02-11T19:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.