SILC-EFSA: Self-aware In-context Learning Correction for Entity-level Financial Sentiment Analysis
- URL: http://arxiv.org/abs/2412.19140v1
- Date: Thu, 26 Dec 2024 09:53:01 GMT
- Title: SILC-EFSA: Self-aware In-context Learning Correction for Entity-level Financial Sentiment Analysis
- Authors: Senbin Zhu, Chenyuan He, Hongde Liu, Pengcheng Dong, Hanjie Zhao, Yuchen Yan, Yuxiang Jia, Hongying Zan, Min Peng,
- Abstract summary: We propose a novel two-stage sentiment analysis approach called Self-aware In-aware Learning Correction (SILC)
The first stage involves fine-tuning a base large language model to generate pseudo-labeled data specific to our task.
In the second stage, we train a correction model using a GNN-based example retriever, which is informed by the pseudo-labeled data.
- Score: 9.48177379759274
- License:
- Abstract: In recent years, fine-grained sentiment analysis in finance has gained significant attention, but the scarcity of entity-level datasets remains a key challenge. To address this, we have constructed the largest English and Chinese financial entity-level sentiment analysis datasets to date. Building on this foundation, we propose a novel two-stage sentiment analysis approach called Self-aware In-context Learning Correction (SILC). The first stage involves fine-tuning a base large language model to generate pseudo-labeled data specific to our task. In the second stage, we train a correction model using a GNN-based example retriever, which is informed by the pseudo-labeled data. This two-stage strategy has allowed us to achieve state-of-the-art performance on the newly constructed datasets, advancing the field of financial sentiment analysis. In a case study, we demonstrate the enhanced practical utility of our data and methods in monitoring the cryptocurrency market. Our datasets and code are available at https://github.com/NLP-Bin/SILC-EFSA.
Related papers
- Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models [29.769616823587594]
We propose the first retrieval-augmented generation (RAG) framework specifically designed for financial time-series forecasting.
Our framework incorporates three key innovations: a fine-tuned 1B large language model (StockLLM) as its backbone, a novel candidate selection method enhanced by LLM feedback, and a training objective that maximizes the similarity between queries and historically significant sequences.
arXiv Detail & Related papers (2025-02-09T12:26:05Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis [4.575870619860645]
We construct a new financial dataset for the claim detection task in the financial domain.
We propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function.
Here, we observe the dependence of earnings surprise and return on our optimism measure.
arXiv Detail & Related papers (2024-02-18T22:55:26Z) - Chinese Fine-Grained Financial Sentiment Analysis with Large Language
Models [4.993565079216378]
We propose a novel and extensive Chinese fine-grained financial sentiment analysis dataset, FinChina SA, for enterprise early warning.
Our dataset will serve as a valuable resource to advance the exploration of real-world financial sentiment analysis tasks.
arXiv Detail & Related papers (2023-06-25T02:24:30Z) - FinEAS: Financial Embedding Analysis of Sentiment [0.0]
We introduce a new language representation model in finance called Financial Embedding Analysis of Sentiment (FinEAS)
In this work, we propose a new model for financial sentiment analysis based on supervised fine-tuned sentence embeddings from a standard BERT model.
arXiv Detail & Related papers (2021-10-31T15:41:56Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.