Improving Table Understanding with LLMs and Entity-Oriented Search
- URL: http://arxiv.org/abs/2508.17028v1
- Date: Sat, 23 Aug 2025 14:02:45 GMT
- Title: Improving Table Understanding with LLMs and Entity-Oriented Search
- Authors: Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung, Thuy-Trang Vu, Dat Quoc Nguyen,
- Abstract summary: We introduce an entity-oriented search method to improve table understanding with large language models (LLMs)<n>This approach effectively leverages the semantic similarities between questions and table data, as well as the implicit relationships between table cells.<n>It focuses on table entities, ensuring that table cells are semantically tightly bound, thereby enhancing contextual clarity.
- Score: 24.3302301035859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our work addresses the challenges of understanding tables. Existing methods often struggle with the unpredictable nature of table content, leading to a reliance on preprocessing and keyword matching. They also face limitations due to the lack of contextual information, which complicates the reasoning processes of large language models (LLMs). To overcome these challenges, we introduce an entity-oriented search method to improve table understanding with LLMs. This approach effectively leverages the semantic similarities between questions and table data, as well as the implicit relationships between table cells, minimizing the need for data preprocessing and keyword matching. Additionally, it focuses on table entities, ensuring that table cells are semantically tightly bound, thereby enhancing contextual clarity. Furthermore, we pioneer the use of a graph query language for table understanding, establishing a new research direction. Experiments show that our approach achieves new state-of-the-art performances on standard benchmarks WikiTableQuestions and TabFact.
Related papers
- STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion [1.483000637348699]
STAR (Semantic Table Representation) is a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion.<n>We show that STAR achieves consistently higher Recall than QGpT on all datasets.
arXiv Detail & Related papers (2026-01-22T11:08:46Z) - TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding [52.59372043981724]
TableDART is a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models.<n>In addition, we propose a novel agent to cross-modal knowledge integration by analyzing outputs from text- and image-based models.
arXiv Detail & Related papers (2025-09-18T07:00:13Z) - An LLM Agent-Based Complex Semantic Table Annotation Approach [13.427066390210538]
This paper proposes an LLM-based agent approach for Column Type.<n>CTA and Cell Entity.<n>CEA.<n>We design and implement five external metrics with tailored prompts based on the ReAct framework.<n>By leveraging Levenshtein distance to reduce redundant annotations, we achieve a 70% reduction in time costs and a 60% reduction in LLM token usage.
arXiv Detail & Related papers (2025-08-18T12:09:20Z) - Improving Table Retrieval with Question Generation from Partial Tables [2.2169618382995764]
We propose QGpT, a simple yet effective method that uses an LLM to generate synthetic questions based on small portions of a table.<n>The generated questions are then jointly embedded with the partial table segments used for generation, enhancing semantic alignment with user queries.
arXiv Detail & Related papers (2025-08-08T09:35:56Z) - LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z) - TableMaster: A Recipe to Advance Table Understanding with Language Models [2.506624215459612]
TableMaster is a recipe and comprehensive framework that integrates multiple solutions to overcome these obstacles.<n>On the WikiTQ dataset, TableMaster achieves an accuracy of 78.13% using GPT-4o-mini, surpassing existing baselines.
arXiv Detail & Related papers (2025-01-31T18:31:31Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies [9.09415727445941]
We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge.
"HiddenTables" is played between the code-generating "r" and the "Oracle windows" which evaluates the ability of the agents to solve Table QA tasks.
We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries.
arXiv Detail & Related papers (2024-06-16T04:53:29Z) - Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.