Related papers: KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration

KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration

URL: http://arxiv.org/abs/2509.17037v1
Date: Sun, 21 Sep 2025 11:15:43 GMT
Title: KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration
Authors: Yajing Yang, Tony Deng, Min-Yen Kan,
Abstract summary: KAHAN is a knowledge-augmented hierarchical framework that extracts insights from raw data at entity, pairwise, group, and system levels.<n>On DataTales financial reporting benchmark, KAHAN outperforms existing approaches by over 20% on narrative quality (GPT-4o)<n>Our results reveal that knowledge quality drives model performance through distillation, hierarchical analysis benefits vary with market complexity, and the framework transfers effectively to healthcare domains.
Score: 21.210770737963085
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose KAHAN, a knowledge-augmented hierarchical framework that systematically extracts insights from raw tabular data at entity, pairwise, group, and system levels. KAHAN uniquely leverages LLMs as domain experts to drive the analysis. On DataTales financial reporting benchmark, KAHAN outperforms existing approaches by over 20% on narrative quality (GPT-4o), maintains 98.2% factuality, and demonstrates practical utility in human evaluation. Our results reveal that knowledge quality drives model performance through distillation, hierarchical analysis benefits vary with market complexity, and the framework transfers effectively to healthcare domains. The data and code are available at https://github.com/yajingyang/kahan.

Related papers

Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain [0.1529342790344802]
This study proposes a general method for constructing high-quality synthetic instruction data for any domain.<n>We constructed a large-scale instruction dataset totaling approximately 9.5 billion tokens with Chain-of-Thought reasoning traces.
arXiv Detail & Related papers (2026-03-02T01:21:54Z)
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value [74.80873109856563]
OpenDataArena (ODA) is a holistic and open platform designed to benchmark the intrinsic value of post-training data.<n>ODA establishes a comprehensive ecosystem comprising four key pillars: (i) a unified training-evaluation pipeline that ensures fair, open comparisons across diverse models; (ii) a multi-dimensional scoring framework that profiles data quality along tens of distinct axes; and (iii) an interactive data lineage explorer to visualize dataset genealogy and dissect component sources.
arXiv Detail & Related papers (2025-12-16T03:33:24Z)
A Framework for Data Valuation and Monetisation [0.0]
This paper introduces a unified valuation framework that integrates economic, governance, and strategic perspectives into a coherent decision-support model.<n>The model combines qualitative scoring, cost- and utility-based estimation, relevance/quality indexing, and multi-criteria weighting to define data value transparently and systematically.
arXiv Detail & Related papers (2025-12-08T15:57:26Z)
Data Quality Taxonomy for Data Monetization [0.0]
This chapter presents a comprehensive taxonomy for assessing data quality in the context of data monetisation.<n>The framework's interconnected "metrics layer" ensures improvements in one dimension cascade into others, maximising strategic impact.<n>This holistic approach bridges the gap between granular technical assessment and high-level decision-making.
arXiv Detail & Related papers (2025-09-30T12:42:02Z)
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation [58.30936615525824]
We present NAIPv2, a debiased and efficient framework for paper quality estimation.<n> NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings.<n>It is trained on pairwise comparisons but enabling efficient pointwise prediction at deployment.
arXiv Detail & Related papers (2025-09-29T17:59:23Z)
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics [2.7933239275667545]
We present an agentic system that automates the data-to-dashboard pipeline through modular LLM agents.<n>Unlike existing chart systems, our framework simulates the analytical reasoning process of business analysts.<n>Our approach shows improved insightfulness, domain relevance, and analytical depth, as measured by tailored evaluation metrics.
arXiv Detail & Related papers (2025-05-29T17:32:15Z)
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science [44.18533574465929]
We introduce AssistedDS, a benchmark designed to evaluate how large language models handle domain knowledge.<n>We assess state-of-the-art LLMs on their ability to discern and apply beneficial versus harmful domain knowledge.<n>Our results demonstrate a substantial gap in current models' ability to critically evaluate and leverage expert knowledge.
arXiv Detail & Related papers (2025-05-25T05:50:21Z)
SCAN: Structured Capability Assessment and Navigation for LLMs [54.54085382131134]
textbfSCAN (Structured Capability Assessment and Navigation) is a practical framework that enables detailed characterization of Large Language Models.<n>SCAN incorporates four key components:.<n>TaxBuilder, which extracts capability-indicating tags from queries to construct a hierarchical taxonomy;.<n>RealMix, a query synthesis and filtering mechanism that ensures sufficient evaluation data for each capability tag;.<n>A PC$2$-based (Pre-Comparison-derived Criteria) LLM-as-a-Judge approach achieves significantly higher accuracy compared to classic LLM-as-a-Judge method
arXiv Detail & Related papers (2025-05-10T16:52:40Z)
Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML)<n>We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z)
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain [62.89809156574998]
We introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain.<n>Our benchmark is characterized by its multi-dimensional evaluation framework.<n>Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets.
arXiv Detail & Related papers (2024-12-17T15:38:42Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.<n>It consists of 100 datasets representing diverse business use cases such as finance and incident management.<n>Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
Analysis of Multidomain Abstractive Summarization Using Salience Allocation [2.6880540371111445]
Season is a model designed to enhance summarization by leveraging salience allocation techniques. This paper employs various evaluation metrics such as ROUGE, METEOR, BERTScore, and MoverScore to evaluate the performance of these models fine-tuned for generating abstractive summaries.
arXiv Detail & Related papers (2024-02-19T08:52:12Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated.<n>We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study [0.0]
The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. These principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing. We design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques.
arXiv Detail & Related papers (2022-11-03T18:45:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.