Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
- URL: http://arxiv.org/abs/2506.04427v1
- Date: Wed, 04 Jun 2025 20:21:52 GMT
- Title: Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
- Authors: Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira,
- Abstract summary: We propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths.<n>Given a natural language query, our method searches this graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies.<n>Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach.
- Score: 8.304761523814564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches this graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.
Related papers
- LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z) - Towards Scalable Schema Mapping using Large Language Models [14.028425711746513]
We identify three core issues with using large language models (LLMs) for schema mapping.<n>We propose methods to address through sampling and aggregation techniques.<n>We propose to mitigate through strategies like data type prefiltering.
arXiv Detail & Related papers (2025-05-30T15:36:56Z) - SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z) - GTR: Graph-Table-RAG for Cross-Table Question Answering [53.11230952572134]
We propose the first Graph-Table-RAG framework, namely GTR, which reorganizes table corpora into a heterogeneous graph.<n> GTR exhibits superior cross-table question-answering performance while maintaining high deployment efficiency, demonstrating its real-world practical applicability.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL [14.677024710675838]
LinkAlign is a novel framework that can effectively adapt existing baselines to real-world environments.<n>We evaluate our method performance on the SPIDER and BIRD benchmarks.<n>LinkAlign ranks highest among models excluding those using long chain-of-thought reasoning LLMs.
arXiv Detail & Related papers (2025-03-24T11:53:06Z) - SRAG: Structured Retrieval-Augmented Generation for Multi-Entity Question Answering over Wikipedia Graph [10.297615455470133]
Multi-entity question answering (MEQA) poses significant challenges for large language models.<n>This paper introduces a structured RAG framework that organizes extracted entities into relational tables.<n>Experiments on Wikipedia-based multi-entity QA tasks demonstrate that SRAG significantly outperforms state-of-the-art long-context LLMs.
arXiv Detail & Related papers (2025-03-03T09:37:33Z) - Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding [42.841205217768106]
"Tree-of-Table" is a novel approach designed to enhance LLMs' reasoning capabilities over large and complex tables.
We show that Tree-of-Table sets a new benchmark with superior performance, showcasing remarkable efficiency and generalization capabilities in large-scale table reasoning.
arXiv Detail & Related papers (2024-11-13T11:02:04Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.