WikiTableEdit: A Benchmark for Table Editing by Natural Language
Instruction
- URL: http://arxiv.org/abs/2403.02962v1
- Date: Tue, 5 Mar 2024 13:33:12 GMT
- Title: WikiTableEdit: A Benchmark for Table Editing by Natural Language
Instruction
- Authors: Zheng Li and Xiang Chen and Xiaojun Wan
- Abstract summary: This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks.
We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations.
We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
- Score: 56.196512595940334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data, as a crucial form of data representation, exists in diverse
formats on the Web. When confronted with complex and irregular tables, manual
modification becomes a laborious task. This paper investigates the performance
of Large Language Models (LLMs) in the context of table editing tasks. Existing
research mainly focuses on regular-shaped tables, wherein instructions are used
to generate code in SQL, Python, or Excel Office-script for manipulating the
tables. Nevertheless, editing tables with irregular structures, particularly
those containing merged cells spanning multiple rows, poses a challenge when
using code. To address this, we introduce the WikiTableEdit dataset. Leveraging
26,531 tables from the WikiSQL dataset, we automatically generate natural
language instructions for six distinct basic operations and the corresponding
outcomes, resulting in over 200,000 instances. Subsequently, we evaluate
several representative large language models on the WikiTableEdit dataset to
demonstrate the challenge of this task. The dataset will be released to the
community to promote related researches.
Related papers
- gTBLS: Generating Tables from Text by Conditional Question Answering [3.240750198587796]
This paper presents a two-stage approach called Generative Tables (gTBLS)
The first stage infers table structure (row and column headers) from the text.
The second stage formulates questions using these headers and fine-tunes a causal language model to answer them.
arXiv Detail & Related papers (2024-03-21T15:04:32Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Generate, Transform, Answer: Question Specific Tool Synthesis for
Tabular Data [6.3455238301221675]
Tabular question answering (TQA) presents a challenging setting for neural systems.
TQA process tables directly, resulting in information loss as table size increases.
We propose ToolWriter to generate query specific programs and detect when to apply them to transform tables.
arXiv Detail & Related papers (2023-03-17T17:26:56Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.