Compositional Generalization for Data-to-Text Generation
- URL: http://arxiv.org/abs/2312.02748v1
- Date: Tue, 5 Dec 2023 13:23:15 GMT
- Title: Compositional Generalization for Data-to-Text Generation
- Authors: Xinnuo Xu, Ivan Titov, Mirella Lapata
- Abstract summary: We propose a novel model that addresses compositional generalization by clustering predicates into groups.
Our model generates text in a sentence-by-sentence manner, relying on one cluster of predicates at a time.
It significantly outperforms T5baselines across all evaluation metrics.
- Score: 86.79706513098104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-to-text generation involves transforming structured data, often
represented as predicate-argument tuples, into coherent textual descriptions.
Despite recent advances, systems still struggle when confronted with unseen
combinations of predicates, producing unfaithful descriptions (e.g.
hallucinations or omissions). We refer to this issue as compositional
generalisation, and it encouraged us to create a benchmark for assessing the
performance of different approaches on this specific problem. Furthermore, we
propose a novel model that addresses compositional generalization by clustering
predicates into groups. Our model generates text in a sentence-by-sentence
manner, relying on one cluster of predicates at a time. This approach
significantly outperforms T5~baselines across all evaluation metrics.Notably,
it achieved a 31% improvement over T5 in terms of a metric focused on
maintaining faithfulness to the input.
Related papers
- Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations.
We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Improving Compositional Generalization with Self-Training for
Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks.
By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures.
We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Reformulating Sentence Ordering as Conditional Text Generation [17.91448517871621]
We present Reorder-BART (RE-BART), a sentence ordering framework.
We reformulate the task as a conditional text-to-marker generation setup.
Our framework achieves the state-of-the-art performance across six datasets in Perfect Match Ratio (PMR) and Kendall's tau ($tau$) metric.
arXiv Detail & Related papers (2021-04-14T18:16:47Z) - Compositional Generalization and Natural Language Variation: Can a
Semantic Parsing Approach Handle Both? [27.590858384414567]
We ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization?
We propose new train and test splits of non-synthetic datasets to better assess this capability.
We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-24T00:38:27Z) - Incomplete Utterance Rewriting as Semantic Segmentation [57.13577518412252]
We present a novel and extensive approach, which formulates it as a semantic segmentation task.
Instead of generating from scratch, such a formulation introduces edit operations and shapes the problem as prediction of a word-level edit matrix.
Our approach is four times faster than the standard approach in inference.
arXiv Detail & Related papers (2020-09-28T09:29:49Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.