A Sequence-to-Sequence&Set Model for Text-to-Table Generation
- URL: http://arxiv.org/abs/2306.00137v1
- Date: Wed, 31 May 2023 19:28:00 GMT
- Title: A Sequence-to-Sequence&Set Model for Text-to-Table Generation
- Authors: Tong Li, Zhihao Wang, Liangying Shao, Xuling Zheng, Xiaoli Wang,
Jinsong Su
- Abstract summary: In this paper, we propose a novel sequence-to-sequence&set text-to-table generation model.
Specifically, we first conduct a preliminary study to demonstrate the generation of most rows is order-insensitive.
Experiment results show that our model significantly surpasses the baselines.
- Score: 35.65374526264392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the text-to-table generation task has attracted increasing
attention due to its wide applications. In this aspect, the dominant model
formalizes this task as a sequence-to-sequence generation task and serializes
each table into a token sequence during training by concatenating all rows in a
top-down order. However, it suffers from two serious defects: 1) the predefined
order introduces a wrong bias during training, which highly penalizes shifts in
the order between rows; 2) the error propagation problem becomes serious when
the model outputs a long token sequence. In this paper, we first conduct a
preliminary study to demonstrate the generation of most rows is
order-insensitive. Furthermore, we propose a novel sequence-to-sequence&set
text-to-table generation model. Specifically, in addition to a text encoder
encoding the input text, our model is equipped with a table header generator to
first output a table header, i.e., the first row of the table, in the manner of
sequence generation. Then we use a table body generator with learnable row
embeddings and column embeddings to generate a set of table body rows in
parallel. Particularly, to deal with the issue that there is no correspondence
between each generated table body row and target during training, we propose a
target assignment strategy based on the bipartite matching between the first
cells of generated table body rows and targets. Experiment results show that
our model significantly surpasses the baselines, achieving state-of-the-art
performance on commonly-used datasets.
Related papers
- gTBLS: Generating Tables from Text by Conditional Question Answering [3.240750198587796]
This paper presents a two-stage approach called Generative Tables (gTBLS)
The first stage infers table structure (row and column headers) from the text.
The second stage formulates questions using these headers and fine-tunes a causal language model to answer them.
arXiv Detail & Related papers (2024-03-21T15:04:32Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Few-Shot Table-to-Text Generation with Prefix-Controlled Generator [11.891732582638227]
We propose a prompt-based approach, Prefix-Controlled Generator (i.e., PCG), for few-shot table-to-text generation.
We prepend a task-specific prefix for a PLM to make the table structure better fit the pre-trained input.
In addition, we generate an input-specific prefix to control the factual contents and word order of the generated text.
arXiv Detail & Related papers (2022-08-23T03:23:26Z) - STable: Table Generation Framework for Encoder-Decoder Models [5.07112098978226]
We propose a framework for text-to-table neural models applicable to problems such as extraction of line items, joint entity and relation extraction, or knowledge base population.
The training maximizes the expected log-likelihood for a table's content across all random permutations of the factorization order.
Experiments demonstrate a high practical value of the framework, which establishes state-of-the-art results on several challenging datasets.
arXiv Detail & Related papers (2022-06-08T17:59:02Z) - Conditional set generation using Seq2seq models [52.516563721766445]
Conditional set generation learns a mapping from an input sequence of tokens to a set.
Sequence-to-sequence(Seq2seq) models are a popular choice to model set generation.
We propose a novel algorithm for effectively sampling informative orders over the space of label orders.
arXiv Detail & Related papers (2022-05-25T04:17:50Z) - Sequence-to-Action: Grammatical Error Correction with Action Guided
Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction.
The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence.
Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z) - Robust (Controlled) Table-to-Text Generation with Structure-Aware
Equivariance Learning [24.233552674892906]
Controlled table-to-text generation seeks to generate natural language descriptions for highlighted subparts of a table.
We propose an equivariance learning framework, which encodes tables with a structure-aware self-attention mechanism.
Our technology is free to be plugged into existing table-to-text generation models, and has improved T5-based models to offer better performance on ToTTo and HiTab.
arXiv Detail & Related papers (2022-05-08T23:37:27Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.