TAP4LLM: Table Provider on Sampling, Augmenting, and Packing
Semi-structured Data for Large Language Model Reasoning
- URL: http://arxiv.org/abs/2312.09039v2
- Date: Sat, 17 Feb 2024 08:31:14 GMT
- Title: TAP4LLM: Table Provider on Sampling, Augmenting, and Packing
Semi-structured Data for Large Language Model Reasoning
- Authors: Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei
Zhang
- Abstract summary: We propose TAP4LLM as a versatile pre-processing toolbox to generate table prompts.
In each module, we collect and design several common methods for usage in various scenarios.
- Score: 58.11442663694328
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Table-based reasoning has shown remarkable progress in combining deep models
with discrete reasoning, which requires reasoning over both free-form natural
language (NL) questions and semi-structured tabular data. However, previous
table reasoning solutions only consider small-sized tables and exhibit
limitations in handling larger tables. In addition, most existing methods
struggle to reason over complex questions since they lack essential information
or they are scattered in different places. To alleviate these challenges, we
propose TAP4LLM as a versatile pre-processing toolbox to generate table prompts
through (1) table sampling, (2) table augmentation, and (3) table packing while
balancing the token allocation trade-off. In each module, we collect and design
several common methods for usage in various scenarios (e.g., speed over
accuracy). We also provide a comprehensive evaluation on performance of each
components inside TAP4LLM and show that our method improves LLMs' reasoning
capabilities in various tabular tasks and enhances the interaction between LLMs
and tabular data by employing effective pre-processing.
Related papers
- ALTER: Augmentation for Large-Table-Based Reasoning [5.164923314261229]
ALTER(Augmentation for Large-Table-Based Reasoning) is a framework designed to harness the latent augmentation potential in both free-form natural language (NL) questions.
By utilizing only a small subset of relevant data from the table, ALTER achieves outstanding performance on table-based reasoning benchmarks.
arXiv Detail & Related papers (2024-07-03T12:34:45Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables.
TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries.
We propose a novel method to address these limitations by introducing query-focused multi-table summarization.
Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z) - Large Language Model for Table Processing: A Survey [9.144614058716083]
Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry.
Tables typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables.
This survey provides an extensive overview of table tasks, encompassing not only the traditional areas like table question answering (Table QA) and fact verification, but also newly emphasized aspects such as table manipulation and advanced table data analysis.
arXiv Detail & Related papers (2024-02-04T00:47:53Z) - Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model.
Our approach involves injecting reasoning information into the input by emphasizing table-specific row data.
On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z) - Large Language Models are Versatile Decomposers: Decompose Evidence and
Questions for Table-based Reasoning [45.013230888670435]
We exploit large language models (LLMs) as decomposers for effective table-based reasoning.
We decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information.
We propose a "parsing-execution-filling" strategy to alleviate the dilemma of the chain of thought.
arXiv Detail & Related papers (2023-01-31T17:51:45Z) - Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic.
We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios.
UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.