Enhancing Business Analytics through Hybrid Summarization of Financial Reports
- URL: http://arxiv.org/abs/2601.09729v1
- Date: Sun, 28 Dec 2025 16:25:12 GMT
- Title: Enhancing Business Analytics through Hybrid Summarization of Financial Reports
- Authors: Tohida Rehman,
- Abstract summary: Financial reports and earnings communications contain large volumes of structured and semi structured information.<n>We present a hybrid summarization framework that combines extractive and abstractive techniques to produce concise and factually reliable summaries.<n>These findings support the development of practical summarization systems for distilling lengthy financial texts into usable business insights.
- Score: 0.152292571922932
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Financial reports and earnings communications contain large volumes of structured and semi structured information, making detailed manual analysis inefficient. Earnings conference calls provide valuable evidence about a firm's performance, outlook, and strategic priorities. The manual analysis of lengthy call transcripts requires substantial effort and is susceptible to interpretive bias and unintentional error. In this work, we present a hybrid summarization framework that combines extractive and abstractive techniques to produce concise and factually reliable Reuters-style summaries from the ECTSum dataset. The proposed two stage pipeline first applies the LexRank algorithm to identify salient sentences, which are subsequently summarized using fine-tuned variants of BART and PEGASUS designed for resource constrained settings. In parallel, we fine-tune a Longformer Encoder-Decoder (LED) model to directly capture long-range contextual dependencies in financial documents. Model performance is evaluated using standard automatic metrics, including ROUGE, METEOR, MoverScore, and BERTScore, along with domain-specific variants such as SciBERTScore and FinBERTScore. To assess factual accuracy, we further employ entity-level measures based on source-precision and F1-target. The results highlight complementary trade offs between approaches, long context models yield the strongest overall performance, while the hybrid framework achieves competitive results with improved factual consistency under computational constraints. These findings support the development of practical summarization systems for efficiently distilling lengthy financial texts into usable business insights.
Related papers
- DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z) - FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering [57.18367828883773]
FinAgentBench is a benchmark for evaluating agentic retrieval with multi-step reasoning in finance.<n>The benchmark consists of 26K expert-annotated examples on S&P-500 listed firms.<n>We evaluate a suite of state-of-the-art models and demonstrate how targeted fine-tuning can significantly improve agentic retrieval performance.
arXiv Detail & Related papers (2025-08-07T22:15:22Z) - Harnessing Generative LLMs for Enhanced Financial Event Entity Extraction Performance [0.0]
Financial event entity extraction is a crucial task for building financial knowledge graphs.<n>Traditional approaches often rely on sequence labeling models, which can struggle with long-range dependencies.<n>We propose a novel method that reframes financial event entity extraction as a text-to-structured generation task.
arXiv Detail & Related papers (2025-04-20T14:23:31Z) - TWSSenti: A Novel Hybrid Framework for Topic-Wise Sentiment Analysis on Social Media Using Transformer Models [0.0]
This study explores a hybrid framework combining transformer-based models to improve sentiment classification accuracy and robustness.<n>The framework addresses challenges such as noisy data, contextual ambiguity, and generalization across diverse datasets.<n>This research highlights its applicability to real-world tasks such as social media monitoring, customer sentiment analysis, and public opinion tracking.
arXiv Detail & Related papers (2025-04-14T05:44:11Z) - COMM:Concentrated Margin Maximization for Robust Document-Level Relation Extraction [5.291403671224172]
Document-level relation extraction (DocRE) is the process of identifying and extracting relations between entities that span multiple sentences within a document.<n>The complexity inherent in DocRE makes the labeling process prone to errors, compounded by the extreme sparsity of positive relation samples.<n>We have developed a robust framework called textittextbfCOMM to better solve DocRE.
arXiv Detail & Related papers (2025-03-18T04:31:57Z) - Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts [25.4439290862464]
We study the problem of bullet point summarization of Earning Callum Transcripts (ECTs) using the recently released dataset.
We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task.
Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain.
arXiv Detail & Related papers (2024-05-03T16:33:16Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.