Related papers: AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments

AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments

URL: http://arxiv.org/abs/2504.05104v2
Date: Wed, 28 May 2025 11:58:37 GMT
Title: AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments
Authors: Saeid Ario Vaghefi, Aymane Hachcham, Veronica Grasso, Jiska Manicus, Nakiete Msemo, Chiara Colesanti Senni, Markus Leippold,
Abstract summary: This study focuses on a real-world application: tracking EWS investments in the Climate Risk and Early Warning Systems (CREWS) Fund.<n>We analyze 25 MDB project documents and evaluate multiple AI-driven classification methods, including zero-shot and few-shot learning.<n>Our results show that the agent-based RAG approach significantly outperforms other methods, achieving 87% accuracy, 89% precision, and 83% recall.
Score: 1.3192560874022086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tracking financial investments in climate adaptation is a complex and expertise-intensive task, particularly for Early Warning Systems (EWS), which lack standardized financial reporting across multilateral development banks (MDBs) and funds. To address this challenge, we introduce an LLM-based agentic AI system that integrates contextual retrieval, fine-tuning, and multi-step reasoning to extract relevant financial data, classify investments, and ensure compliance with funding guidelines. Our study focuses on a real-world application: tracking EWS investments in the Climate Risk and Early Warning Systems (CREWS) Fund. We analyze 25 MDB project documents and evaluate multiple AI-driven classification methods, including zero-shot and few-shot learning, fine-tuned transformer-based classifiers, chain-of-thought (CoT) prompting, and an agent-based retrieval-augmented generation (RAG) approach. Our results show that the agent-based RAG approach significantly outperforms other methods, achieving 87\% accuracy, 89\% precision, and 83\% recall. Additionally, we contribute a benchmark dataset and expert-annotated corpus, providing a valuable resource for future research in AI-driven financial tracking and climate finance transparency.

Related papers

Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z)
VERAFI: Verified Agentic Financial Intelligence through Neurosymbolic Policy Generation [2.43679682660038]
VERAFI is an agentic framework with neurosymbolic policy generation for verified financial intelligence.<n> VERAFI combines state-of-the-art dense retrieval and cross-encoder reranking with financial tool-enabled agents and automated reasoning policies.
arXiv Detail & Related papers (2025-12-12T17:17:43Z)
CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency [60.83660377169452]
This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate the real-world capabilities of Large Language Model (LLM) agents.<n>Unlike general-purpose agent benchmarks for search and prediction, professional crypto analysis presents specific challenges.
arXiv Detail & Related papers (2025-11-29T09:52:34Z)
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z)
Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction [0.5814806132299305]
We introduce a framework for financial Retrieval Augmented Generation (RAG)<n>RAG generates multiple, nonequivalent queries to boost the effectiveness and coverage of retrieval from large, structured financial corpora.<n>Our pipeline is optimized for token efficiency and multi-step financial reasoning.
arXiv Detail & Related papers (2025-09-19T19:24:30Z)
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction [92.7392863957204]
FutureX is the largest and most diverse live benchmark for future prediction.<n>It supports real-time daily updates and eliminates data contamination through an automated pipeline for question gathering and answer collection.<n>We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools.
arXiv Detail & Related papers (2025-08-16T08:54:08Z)
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering [57.18367828883773]
FinAgentBench is a benchmark for evaluating agentic retrieval with multi-step reasoning in finance.<n>The benchmark consists of 26K expert-annotated examples on S&P-500 listed firms.<n>We evaluate a suite of state-of-the-art models and demonstrate how targeted fine-tuning can significantly improve agentic retrieval performance.
arXiv Detail & Related papers (2025-08-07T22:15:22Z)
Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning [12.548390779247987]
We introduce the Agentar-Fin-R1 series of financial large language models.<n>Our optimization approach integrates a high-quality, systematic financial task label system.<n>Our models undergo comprehensive evaluation on mainstream financial benchmarks.
arXiv Detail & Related papers (2025-07-22T17:52:16Z)
Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation [3.077814260904367]
We propose FinAR-Bench, a benchmark dataset focusing on financial statement analysis.<n>We break this task into three measurable steps: extracting key information, calculating financial indicators, and applying logical reasoning.<n>Our findings offer a clear understanding of LLMs current strengths and limitations in fundamental analysis.
arXiv Detail & Related papers (2025-05-22T07:06:20Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [63.55583665003167]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance. FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets. By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
Generative AI Enhanced Financial Risk Management Information Retrieval [0.0]
RiskData is a dataset curated for finetuning embedding models in risk management. RiskEmbed is a finetuned embedding model designed to improve retrieval accuracy in financial question-answering systems.
arXiv Detail & Related papers (2025-04-04T20:42:38Z)
Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions [51.43521977132062]
Money laundering is a financial crime that obscures the origin of illicit funds.<n>The proliferation of mobile payment platforms and smart IoT devices has significantly complicated anti-money laundering investigations.<n>This paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML.
arXiv Detail & Related papers (2025-03-13T05:19:44Z)
FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models [0.0]
FinanceQA is a testing suite that evaluates LLMs' performance on complex numerical financial analysis tasks that mirror real-world investment work.<n>Current LLMs fail to meet the strict accuracy requirements of financial institutions, with models failing approximately 60% of realistic tasks.<n>Results show that higher-quality training data is needed to support such tasks, which we experiment with using OpenAI's fine-tuning API.
arXiv Detail & Related papers (2025-01-30T00:06:55Z)
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models [6.2474959166074955]
This paper presents FinRobot, the first AI agent framework specifically designed for equity research. FinRobot employs a multi-agent Chain of Thought (CoT) system, integrating both quantitative and qualitative analyses to emulate the comprehensive reasoning of a human analyst. Unlike existing automated research tools, such as CapitalCube and Wright Reports, FinRobot delivers insights comparable to those produced by major brokerage firms and fundamental research vendors.
arXiv Detail & Related papers (2024-11-13T17:38:07Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
Financial Knowledge Large Language Model [4.599537455808687]
We introduce IDEA-FinBench, an evaluation benchmark for assessing financial knowledge in large language models (LLMs) We propose IDEA-FinKER, a framework designed to facilitate the rapid adaptation of general LLMs to the financial domain. Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs.
arXiv Detail & Related papers (2024-06-29T08:26:49Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
A machine learning workflow to address credit default prediction [0.44943951389724796]
Credit default prediction (CDP) plays a crucial role in assessing the creditworthiness of individuals and businesses. We propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations.
arXiv Detail & Related papers (2024-03-06T15:30:41Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
Multimodal Gen-AI for Fundamental Investment Research [2.559302299676632]
This report outlines a transformative initiative in the financial investment industry, where the conventional decision-making process is being reimagined. We seek to evaluate the effectiveness of fine-tuning methods on a base model (Llama2) to achieve specific application-level goals. The project encompasses a diverse corpus dataset, including research reports, investment memos, market news, and extensive time-series market data.
arXiv Detail & Related papers (2023-12-24T03:35:13Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role. Recent machine and deep learning techniques have been applied to the task. We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.