Related papers: Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data

Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data

URL: http://arxiv.org/abs/2303.10138v1
Date: Fri, 17 Mar 2023 17:26:56 GMT
Title: Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data
Authors: Carlos Gemmell, Jeffrey Dalton
Abstract summary: Tabular question answering (TQA) presents a challenging setting for neural systems. TQA process tables directly, resulting in information loss as table size increases. We propose ToolWriter to generate query specific programs and detect when to apply them to transform tables.
Score: 6.3455238301221675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data. Unlike humans who use programmatic tools like filters to transform data before processing, language models in TQA process tables directly, resulting in information loss as table size increases. In this paper we propose ToolWriter to generate query specific programs and detect when to apply them to transform tables and align them with the TQA model's capabilities. Focusing ToolWriter to generate row-filtering tools improves the state-of-the-art for WikiTableQuestions and WikiSQL with the most performance gained on long tables. By investigating headroom, our work highlights the broader potential for programmatic tools combined with neural components to manipulate large amounts of structured data.

Related papers

CRAFT: Training-Free Cascaded Retrieval for Tabular QA [11.984180880537936]
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries.<n>textbfCRAFT$ is a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables.<n>textbfCRAFT$ achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers.
arXiv Detail & Related papers (2025-05-21T00:09:34Z)
AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework [22.72266037804117]
Tabular Question Answering (TQA) allows users to quickly and efficiently extract meaningful insights from structured data. Many tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. This question-aware data preparation involves specific tasks such as column augmentation and filtering tailored to particular questions. We propose AutoPrep, a large language model (LLM)-based multi-agent framework that leverages the strengths of multiple agents.
arXiv Detail & Related papers (2024-12-10T11:03:49Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
Table Question Answering for Low-resourced Indic Languages [71.57359949962678]
TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output. We introduce a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget. We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models.
arXiv Detail & Related papers (2024-10-04T16:26:12Z)
Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction [1.0968343822308813]
This paper proposes a novel approach that extracts triples straightforward from tabular data and integrates it with a retrieval-augmented generation (RAG) model to enhance the accuracy, coherence, and contextual richness of responses generated by a fine-tuned GPT-3.5-turbo-0125 model. Our approach significantly outperforms existing baselines on the FeTaQA dataset, particularly excelling in Sacre-BLEU and ROUGE metrics.
arXiv Detail & Related papers (2024-09-21T16:46:15Z)
WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction [56.196512595940334]
This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks. We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations. We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
arXiv Detail & Related papers (2024-03-05T13:33:12Z)
In-Context Data Distillation with TabPFN [11.553950697974825]
In-context data distillation (ICD) is a novel methodology that effectively eliminates these constraints by optimizing TabPFN's context. ICD efficiently enables TabPFN to handle significantly larger datasets with a fixed memory budget, improving TabPFN's quadratic memory complexity but at the cost of a linear number of tuning steps.
arXiv Detail & Related papers (2024-02-10T15:23:45Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic. We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios. UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z)
Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?" Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.