gTBLS: Generating Tables from Text by Conditional Question Answering
- URL: http://arxiv.org/abs/2403.14457v1
- Date: Thu, 21 Mar 2024 15:04:32 GMT
- Title: gTBLS: Generating Tables from Text by Conditional Question Answering
- Authors: Anirudh Sundar, Christopher Richardson, Larry Heck,
- Abstract summary: This paper presents a two-stage approach called Generative Tables (gTBLS)
The first stage infers table structure (row and column headers) from the text.
The second stage formulates questions using these headers and fine-tunes a causal language model to answer them.
- Score: 3.240750198587796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distilling large, unstructured text into a structured, condensed form such as tables is an open research problem. One of the primary challenges in automatically generating tables is ensuring their syntactic validity. Prior approaches address this challenge by including additional parameters in the Transformer's attention mechanism to attend to specific rows and column headers. In contrast to this single-stage method, this paper presents a two-stage approach called Generative Tables (gTBLS). The first stage infers table structure (row and column headers) from the text. The second stage formulates questions using these headers and fine-tunes a causal language model to answer them. Furthermore, the gTBLS approach is amenable to the utilization of pre-trained Large Language Models in a zero-shot configuration, presenting a solution for table generation in situations where fine-tuning is not feasible. gTBLS improves prior approaches by up to 10% in BERTScore on the table construction task and up to 20% on the table content generation task of the E2E, WikiTableText, WikiBio, and RotoWire datasets.
Related papers
- WikiTableEdit: A Benchmark for Table Editing by Natural Language
Instruction [56.196512595940334]
This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks.
We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations.
We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
arXiv Detail & Related papers (2024-03-05T13:33:12Z) - Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z) - A Sequence-to-Sequence&Set Model for Text-to-Table Generation [35.65374526264392]
In this paper, we propose a novel sequence-to-sequence&set text-to-table generation model.
Specifically, we first conduct a preliminary study to demonstrate the generation of most rows is order-insensitive.
Experiment results show that our model significantly surpasses the baselines.
arXiv Detail & Related papers (2023-05-31T19:28:00Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Few-Shot Table-to-Text Generation with Prefix-Controlled Generator [11.891732582638227]
We propose a prompt-based approach, Prefix-Controlled Generator (i.e., PCG), for few-shot table-to-text generation.
We prepend a task-specific prefix for a PLM to make the table structure better fit the pre-trained input.
In addition, we generate an input-specific prefix to control the factual contents and word order of the generated text.
arXiv Detail & Related papers (2022-08-23T03:23:26Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TableFormer: Robust Transformer Modeling for Table-Text Encoding [18.00127368618485]
Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias.
In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer.
arXiv Detail & Related papers (2022-03-01T07:23:06Z) - Text-to-Table: A New Way of Information Extraction [8.326657025342042]
We study a new problem setting of information extraction (IE), referred to as text-to-table.
In text-to-table, given a text, one creates a table or several tables expressing the main content of the text.
We make use of four existing table-to-text datasets in our experiments on text-to-table.
arXiv Detail & Related papers (2021-09-06T19:35:46Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.