FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial
Analysis
- URL: http://arxiv.org/abs/2401.10506v1
- Date: Fri, 19 Jan 2024 05:48:07 GMT
- Title: FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial
Analysis
- Authors: Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen,
Dongfang Lou, Jinshu Lin
- Abstract summary: There is no practical Text-to- benchmark dataset for financial analysis.
We propose a model-agnostic Large Language Model (LLMs) for financial analysis.
- Score: 28.514754357658482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-SQL, which provides zero-code interface for operating relational
databases, has gained much attention in financial analysis; because, financial
professionals may not well-skilled in SQL programming. However, until now,
there is no practical Text-to-SQL benchmark dataset for financial analysis, and
existing Text-to-SQL methods have not considered the unique characteristics of
databases in financial applications, such as commonly existing wide tables. To
address these issues, we collect a practical Text-to-SQL benchmark dataset and
propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL
framework for financial analysis. The benchmark dataset, BULL, is collected
from the practical financial analysis business of Hundsun Technologies Inc.,
including databases for fund, stock, and macro economy. Besides, the proposed
LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for
financial Text-to-SQL from the perspectives of prompt construction,
parameter-efficient fine-tuning and output calibration. Extensive experimental
results on BULL demonstrate that FinSQL achieves the state-of-the-art
Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to
36.64% performance improvement in scenarios requiring few-shot cross-database
model transfer.
Related papers
- FinAI Data Assistant: LLM-based Financial Database Query Processing with the OpenAI Function Calling API [1.1985612872852671]
FinAI Data Assistant is a practical approach for natural-ahead querying over financial databases.<n>System routes user requests to a small library of vetted, parameterized queries.<n>Result: Ticker-mapping accuracy is near-perfect for NASDAQ-100 and high for S&P500 firms.
arXiv Detail & Related papers (2025-10-15T23:19:27Z) - FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling [1.8679829796354372]
We introduce a curated financial dataset (FINCH) comprising 292 tables and 75,725 natural language-based pairs.<n>We benchmark reasoning models and language models of varying scales, providing a systematic analysis of their strengths and limitations.<n>Finally, we propose a finance-oriented evaluation metric (FINCH Score) that captures nuances overlooked by existing measures.
arXiv Detail & Related papers (2025-10-02T10:55:11Z) - FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis [0.0]
FinStat2 is a lightweight text2sql pipeline enabling natural language queries over financial statements.<n>We build a domain-specific database and evaluate models on a synthetic QA.<n>A fine-tuned 7B model achieves 61.33% accuracy with sub-4-second response times on consumer hardware.
arXiv Detail & Related papers (2025-06-29T14:55:21Z) - Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries.
To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset)
We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z) - Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries.
Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD.
This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [15.75829309721909]
Generating accuratesql from natural language questions (text-to-) is a long-standing challenge.
PLMs have been developed and utilized for text-to- tasks, achieving promising performance.
Recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding.
arXiv Detail & Related papers (2024-06-12T17:13:17Z) - Enhancing Text-to-SQL Translation for Financial System Design [5.248014305403357]
We consider Large Language Models (LLMs), which have achieved state of the art for various NLP tasks.
We propose two novel metrics that were designed to adequately measure the similarity between relational queries.
arXiv Detail & Related papers (2023-12-22T14:34:19Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple
Experts Fine-tuning [74.99318727786337]
We propose Multiple Experts Fine-tuning Framework to build a financial large language model (LLM)
We build a financial instruction-tuning dataset named DISC-FIN-SFT, including instruction samples of four categories (consulting, NLP tasks, computing and retrieval-augmented generation)
Evaluations conducted on multiple benchmarks demonstrate that our model performs better than baseline models in various financial scenarios.
arXiv Detail & Related papers (2023-10-23T11:33:41Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.