Related papers: TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

URL: http://arxiv.org/abs/2312.09039v3
Date: Thu, 10 Oct 2024 15:06:53 GMT
Title: TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning
Authors: Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang,
Abstract summary: We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
Score: 55.33939289989238
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Table reasoning tasks have shown remarkable progress with the development of large language models (LLMs), which involve interpreting and drawing conclusions from tabular data based on natural language (NL) questions. Existing solutions mainly tested on smaller tables face scalability issues and struggle with complex queries due to incomplete or dispersed data across different table sections. To alleviate these challenges, we propose TAP4LLM as a versatile pre-processor suite for leveraging LLMs in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding. In each module, we design and compare several common methods under various usage scenarios, aiming to shed light on the best practices for leveraging LLMs for table-reasoning tasks. Our experiments show that our method improves LLMs' reasoning capabilities in various tabular tasks and enhances the interaction between LLMs and tabular data by employing effective pre-processing.

Related papers

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges [22.054723113358865]
This paper introduces key concepts through a taxonomy of tabular input representations and an introduction of table understanding tasks.<n>Tables are two-dimensional, encompassing formats that range from well-structured database tables to complex, multi-layered spreadsheets, each with different purposes.<n>We highlight several critical gaps in the field that indicate the need for further research.
arXiv Detail & Related papers (2025-07-31T23:41:31Z)
Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z)
Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering [16.790216473975146]
We conduct the first controlled study on the effectiveness of several combinations of table representations and models from two perspectives.<n>We find that the best combination of table representation and model varies across setups.<n>We propose FRES, a method selecting table representations dynamically, and observe a 10% average performance improvement.
arXiv Detail & Related papers (2025-05-20T09:36:17Z)
TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models [57.005158277893194]
TableLoRA is a module designed to improve LLMs' understanding of table structure during PEFT. It incorporates special tokens for serializing tables with special token encoder and uses 2D LoRA to encode low-rank information on cell positions.
arXiv Detail & Related papers (2025-03-06T12:50:14Z)
Better Think with Tables: Leveraging Tables to Enhance Large Language Model Comprehension [33.32086403802351]
Thinking with tables is a technique that assists Large Langauge Models (LLMs) to leverage tables for intermediate thinking aligning with human cognitive behavior. We show that our approach achieves a 40.29% average relative performance increase, higher robustness, and show generalizability to different requests, conditions, or scenarios.
arXiv Detail & Related papers (2024-12-22T23:31:03Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
ALTER: Augmentation for Large-Table-Based Reasoning [5.164923314261229]
ALTER(Augmentation for Large-Table-Based Reasoning) is a framework designed to harness the latent augmentation potential in both free-form natural language (NL) questions. By utilizing only a small subset of relevant data from the table, ALTER achieves outstanding performance on table-based reasoning benchmarks.
arXiv Detail & Related papers (2024-07-03T12:34:45Z)
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations [8.602181445598776]
We show how machine learning can be used to automate the annotation of large volumes of diverse tabular data. We release AnnotatedTables, a collection of 32,119 databases with LLM-generated annotations. We evaluate the performance of TabPFN, a recent neural classifier trained on Bayesian priors, on 2,720 tables with input-target columns identified by LLMs.
arXiv Detail & Related papers (2024-06-24T06:44:14Z)
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies [9.09415727445941]
We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. "HiddenTables" is played between the code-generating "r" and the "Oracle windows" which evaluates the ability of the agents to solve Table QA tasks. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries.
arXiv Detail & Related papers (2024-06-16T04:53:29Z)
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. TACT contains challenging instructions that demand stitching information scattered across one or more texts. We construct this dataset by leveraging an existing dataset of texts and their associated tables. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z)
QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries. We propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z)
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z)
Large Language Model for Table Processing: A Survey [18.32332372134988]
This survey provides a comprehensive overview of table-related tasks. It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis.
arXiv Detail & Related papers (2024-02-04T00:47:53Z)
HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model. Our approach involves injecting reasoning information into the input by emphasizing table-specific row data. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.