Related papers: A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models

A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models

URL: http://arxiv.org/abs/2409.17581v1
Date: Thu, 26 Sep 2024 06:57:22 GMT
Title: A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models
Authors: Syed Affan Daimi, Asma Iqbal,
Abstract summary: We propose a novel data-driven approach to analyze and rate the performance of companies based on their SEC 10-K filings. The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations. The application showcases the rating results and provides year-on-year comparisons of company performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The number of companies listed on the NYSE has been growing exponentially, creating a significant challenge for market analysts, traders, and stockholders who must monitor and assess the performance and strategic shifts of a large number of companies regularly. There is an increasing need for a fast, cost-effective, and comprehensive method to evaluate the performance and detect and compare many companies' strategy changes efficiently. We propose a novel data-driven approach that leverages large language models (LLMs) to systematically analyze and rate the performance of companies based on their SEC 10-K filings. These filings, which provide detailed annual reports on a company's financial performance and strategic direction, serve as a rich source of data for evaluating various aspects of corporate health, including confidence, environmental sustainability, innovation, and workforce management. We also introduce an automated system for extracting and preprocessing 10-K filings. This system accurately identifies and segments the required sections as outlined by the SEC, while also isolating key textual content that contains critical information about the company. This curated data is then fed into Cohere's Command-R+ LLM to generate quantitative ratings across various performance metrics. These ratings are subsequently processed and visualized to provide actionable insights. The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations. The application showcases the rating results and provides year-on-year comparisons of company performance.

Related papers

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making. FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z)
Off-policy Evaluation for Payments at Adyen [0.0]
Off-Policy Evaluation (OPE) was applied to accelerate recommender system development and optimization at Adyen. Our analysis, conducted on a billion-scale dataset of transactions, reveals a strong correlation between OPE estimates and online A/B test results. We provide guidance on their effectiveness and integration into the decision-making systems for large-scale industrial payment systems.
arXiv Detail & Related papers (2025-01-15T22:17:01Z)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines [64.61315565501681]
Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG) is a novel task that enables foundation models to process multi-modal web content. Despite its potential impact, M$2$RAG remains understudied, lacking comprehensive analysis and high-quality data resources.
arXiv Detail & Related papers (2024-11-25T13:20:19Z)
AI in Investment Analysis: LLMs for Equity Stock Ratings [0.2916558661202724]
This paper explores the application of Large Language Models (LLMs) to generate multi-horizon stock ratings. Our study addresses these issues by leveraging LLMs to improve the accuracy and consistency of stock ratings. Our results show that our benchmark method outperforms traditional stock rating methods when assessed by forward returns.
arXiv Detail & Related papers (2024-10-30T15:06:57Z)
Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach [16.31469678670097]
We introduce Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata. We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata.
arXiv Detail & Related papers (2024-08-16T05:34:50Z)
Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach [0.0]
This paper presents a novel approach to financial news processing that leverages Large Language Models (LLMs) We introduce a system that extracts relevant company tickers from raw news article content, performs sentiment analysis at the company level, and generates summaries. We are the first data provider to offer granular, per-company sentiment analysis from news articles, enhancing the depth of information available to market participants.
arXiv Detail & Related papers (2024-07-22T16:47:31Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features. It consists of 100 datasets representing diverse business use cases such as finance and incident management. Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
A Bargaining-based Approach for Feature Trading in Vertical Federated Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions. Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z)
Glitter or Gold? Deriving Structured Insights from Sustainability Reports via Large Language Models [16.231171704561714]
This study uses Information Extraction (IE) methods to extract structured insights related to ESG aspects from companies' sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights.
arXiv Detail & Related papers (2023-10-09T11:34:41Z)
Harnessing the Web and Knowledge Graphs for Automated Impact Investing Scoring [2.4107880640624706]
We describe a data-driven system that seeks to automate the process of creating an Sustainable Development Goals framework. We propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies. Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89.
arXiv Detail & Related papers (2023-08-04T15:14:16Z)
Dynamic Datasets and Market Environments for Financial Reinforcement Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments. We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z)
An Informative Tracking Benchmark [133.0931262969931]
We develop a small and informative tracking benchmark (ITB) with 7% out of 1.2 M frames of existing and newly collected datasets. We select the most informative sequences from existing benchmarks taking into account 1) challenging level, 2) discriminative strength, 3) and density of appearance variations. By analyzing the results of 15 state-of-the-art trackers re-trained on the same data, we determine the effective methods for robust tracking under each scenario.
arXiv Detail & Related papers (2021-12-13T07:56:16Z)
TTRS: Tinkoff Transactions Recommender System benchmark [62.997667081978825]
We present the TTRS - Tinkoff Transactions Recommender System benchmark. This financial transaction benchmark contains over 2 million interactions between almost 10,000 users and more than 1,000 merchant brands over 14 months. We also present a comprehensive comparison of the current popular RecSys methods on the next-period recommendation task and conduct a detailed analysis of their performance against various metrics and recommendation goals.
arXiv Detail & Related papers (2021-10-11T20:04:07Z)
MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces [51.123916699062384]
MARS-Gym is an open-source framework to build and evaluate Reinforcement Learning agents for recommendations in marketplaces. We provide the implementation of a diverse set of baseline agents, with a metrics-driven analysis of them in the Trivago marketplace dataset. We expect to bridge the gap between academic research and production systems, as well as to facilitate the design of new algorithms and applications.
arXiv Detail & Related papers (2020-09-30T16:39:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.