Generate, Transform, Answer: Question Specific Tool Synthesis for
Tabular Data
- URL: http://arxiv.org/abs/2303.10138v1
- Date: Fri, 17 Mar 2023 17:26:56 GMT
- Title: Generate, Transform, Answer: Question Specific Tool Synthesis for
Tabular Data
- Authors: Carlos Gemmell, Jeffrey Dalton
- Abstract summary: Tabular question answering (TQA) presents a challenging setting for neural systems.
TQA process tables directly, resulting in information loss as table size increases.
We propose ToolWriter to generate query specific programs and detect when to apply them to transform tables.
- Score: 6.3455238301221675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular question answering (TQA) presents a challenging setting for neural
systems by requiring joint reasoning of natural language with large amounts of
semi-structured data. Unlike humans who use programmatic tools like filters to
transform data before processing, language models in TQA process tables
directly, resulting in information loss as table size increases. In this paper
we propose ToolWriter to generate query specific programs and detect when to
apply them to transform tables and align them with the TQA model's
capabilities. Focusing ToolWriter to generate row-filtering tools improves the
state-of-the-art for WikiTableQuestions and WikiSQL with the most performance
gained on long tables. By investigating headroom, our work highlights the
broader potential for programmatic tools combined with neural components to
manipulate large amounts of structured data.
Related papers
- TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Table Question Answering for Low-resourced Indic Languages [71.57359949962678]
TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output.
We introduce a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget.
We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models.
arXiv Detail & Related papers (2024-10-04T16:26:12Z) - Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction [1.0968343822308813]
This paper proposes a novel approach that extracts triples straightforward from tabular data and integrates it with a retrieval-augmented generation (RAG) model to enhance the accuracy, coherence, and contextual richness of responses generated by a fine-tuned GPT-3.5-turbo-0125 model.
Our approach significantly outperforms existing baselines on the FeTaQA dataset, particularly excelling in Sacre-BLEU and ROUGE metrics.
arXiv Detail & Related papers (2024-09-21T16:46:15Z) - WikiTableEdit: A Benchmark for Table Editing by Natural Language
Instruction [56.196512595940334]
This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks.
We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations.
We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
arXiv Detail & Related papers (2024-03-05T13:33:12Z) - In-Context Data Distillation with TabPFN [11.553950697974825]
In-context data distillation (ICD) is a novel methodology that effectively eliminates these constraints by optimizing TabPFN's context.
ICD efficiently enables TabPFN to handle significantly larger datasets with a fixed memory budget, improving TabPFN's quadratic memory complexity but at the cost of a linear number of tuning steps.
arXiv Detail & Related papers (2024-02-10T15:23:45Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic.
We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios.
UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.