Related papers: WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction

WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction

URL: http://arxiv.org/abs/2403.02962v1
Date: Tue, 5 Mar 2024 13:33:12 GMT
Title: WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
Authors: Zheng Li and Xiang Chen and Xiaojun Wan
Abstract summary: This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks. We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations. We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
Score: 56.196512595940334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tabular data, as a crucial form of data representation, exists in diverse formats on the Web. When confronted with complex and irregular tables, manual modification becomes a laborious task. This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks. Existing research mainly focuses on regular-shaped tables, wherein instructions are used to generate code in SQL, Python, or Excel Office-script for manipulating the tables. Nevertheless, editing tables with irregular structures, particularly those containing merged cells spanning multiple rows, poses a challenge when using code. To address this, we introduce the WikiTableEdit dataset. Leveraging 26,531 tables from the WikiSQL dataset, we automatically generate natural language instructions for six distinct basic operations and the corresponding outcomes, resulting in over 200,000 instances. Subsequently, we evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task. The dataset will be released to the community to promote related researches.

Related papers

General Table Question Answering via Answer-Formula Joint Generation [27.599437384914186]
Advanced table question answering (TableQA) methods prompt large language models (LLMs) to generate answer text. These methods lack the versatility to cope with specific question types or table structures. We propose textttTabAF, a general table answering framework to solve multiple types of tasks over multiple types of tables simultaneously.
arXiv Detail & Related papers (2025-03-16T03:51:06Z)
gTBLS: Generating Tables from Text by Conditional Question Answering [3.240750198587796]
This paper presents a two-stage approach called Generative Tables (gTBLS) The first stage infers table structure (row and column headers) from the text. The second stage formulates questions using these headers and fine-tunes a causal language model to answer them.
arXiv Detail & Related papers (2024-03-21T15:04:32Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data [6.3455238301221675]
Tabular question answering (TQA) presents a challenging setting for neural systems. TQA process tables directly, resulting in information loss as table size increases. We propose ToolWriter to generate query specific programs and detect when to apply them to transform tables.
arXiv Detail & Related papers (2023-03-17T17:26:56Z)
Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?" Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)
ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples. We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.