Related papers: Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach

URL: http://arxiv.org/abs/2407.15788v1
Date: Mon, 22 Jul 2024 16:47:31 GMT
Title: Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach
Authors: Rian Dolphin, Joe Dursun, Jonathan Chow, Jarrett Blankenship, Katie Adams, Quinton Pike,
Abstract summary: This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries. We are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Financial news plays a crucial role in decision-making processes across the financial sector, yet the efficient processing of this information into a structured format remains challenging. This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) to overcome limitations that previously prevented the extraction of structured data from unstructured financial news. We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries, all without relying on pre-structured data feeds. Our methodology combines the generative capabilities of LLMs, and recent prompting techniques, with a robust validation framework that uses a tailored string similarity approach. Evaluation on a dataset of 5530 financial news articles demonstrates the effectiveness of our approach, with 90% of articles not missing any tickers compared with current data providers, and 22% of articles having additional relevant tickers. In addition to this paper, the methodology has been implemented at scale with the resulting processed data made available through a live API endpoint, which is updated in real-time with the latest news. To the best of our knowledge, we are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants. We also release the evaluation dataset of 5530 processed articles as a static file, which we hope will facilitate further research leveraging financial news.

Related papers

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing [33.23601503890859]
We propose a semantic-based and multi-level pairing framework to pair text with financial time-series data.<n>We show that applying our method to proprietary yet carefully curated news sources leads to higher-quality paired data and improved stock price forecasting performance.
arXiv Detail & Related papers (2026-03-03T07:45:57Z)
FinSight: Towards Real-World Financial Deep Research [68.31086471310773]
FinSight is a novel framework for producing high-quality, multimodal financial reports.<n>To ensure professional-grade visualization, we propose an Iterative Vision-Enhanced Mechanism.<n>A two-stage Writing Framework expands concise Chain-of-Analysis segments into coherent, citation-aware, and multimodal reports.
arXiv Detail & Related papers (2025-10-19T14:05:35Z)
Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z)
A Python Tool for Reconstructing Full News Text from GDELT [0.0]
This paper presents a novel approach to obtaining full-text newspaper articles at near-zero cost. We focus on the GDELT Web News NGrams 3.0 dataset, which provides high-frequency updates of n-grams extracted from global online news sources. We provide Python code to reconstruct full-text articles from these n-grams by identifying overlapping textual fragments and intelligently merging them.
arXiv Detail & Related papers (2025-04-22T17:40:42Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis [4.575870619860645]
We construct a new financial dataset for the claim detection task in the financial domain. We propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function. Here, we observe the dependence of earnings surprise and return on our optimism measure.
arXiv Detail & Related papers (2024-02-18T22:55:26Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs [48.87627426640621]
This research focuses on harnessing the potential of Large Language Models to comprehend critical information from financial reports. We propose an Automated Financial Information Extraction framework that enhances LLMs' ability to comprehend and extract information from financial reports. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively.
arXiv Detail & Related papers (2023-05-24T10:35:58Z)
Dynamic Datasets and Market Environments for Financial Reinforcement Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments. We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z)
FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents [14.269860621624394]
We propose and implement a deep learning framework that splits long documents into chunks and utilize pre-trained LMs to process and aggregate the chunks into vector representations. We evaluate our framework on a collection of 10-K public disclosure reports from US banks, and another dataset of reports submitted by US companies.
arXiv Detail & Related papers (2022-06-14T16:14:14Z)
Financial data analysis application via multi-strategy text processing [0.2741266294612776]
This paper mainly focuses on the stock trading data and news about China A-share companies. We present our efforts and plans in deep learning financial text processing application scenarios using natural language processing (NLP) and knowledge graph (KG) technologies.
arXiv Detail & Related papers (2022-04-25T01:56:36Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)
Fact Check: Analyzing Financial Events from Multilingual News Sources [22.504723681328507]
We propose FactCheck in finance, a web-based news aggregator with deep learning models. A web interface is provided to examine the credibility of news articles using a transformer-based fact-checker. The performance of the fact checker is evaluated using a dataset related to merger and acquisition (M&A) events.
arXiv Detail & Related papers (2021-06-29T10:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.