Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation
- URL: http://arxiv.org/abs/2212.01956v1
- Date: Sun, 4 Dec 2022 23:59:41 GMT
- Title: Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation
- Authors: Faeze Brahman, Baolin Peng, Michel Galley, Sudha Rao, Bill Dolan,
Snigdha Chaturvedi, Jianfeng Gao
- Abstract summary: We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
- Score: 92.1582872870226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained language models have recently enabled open-ended generation
frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond
the traditional data-to-text generation. While this framework is more general,
it is under-specified and often leads to a lack of controllability restricting
their real-world usage. We propose a new grounded keys-to-text generation task:
the task is to generate a factual description about an entity given a set of
guiding keys, and grounding passages. To address this task, we introduce a new
dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we
propose an automatic metric, MAFE, for factual correctness of generated
descriptions. Our EntDescriptor model is equipped with strong rankers to fetch
helpful passages and generate entity descriptions. Experimental result shows a
good correlation (60.14) between our proposed metric and human judgments of
factuality. Our rankers significantly improved the factual correctness of
generated descriptions (15.95% and 34.51% relative gains in recall and
precision). Finally, our ablation study highlights the benefit of combining
keys and groundings.
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text.
Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references.
We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Facts2Story: Controlling Text Generation by Key Facts [0.0]
We propose a controlled generation task based on expanding a sequence of facts, expressed in natural language, into a longer narrative.
We show that while auto-regressive, unidirectional Language Models such as GPT2 produce better fluency, they struggle to adhere to the requested facts.
We propose a plan-and-cloze model (using fine-tuned XLNet) which produces competitive fluency while adhering to the requested content.
arXiv Detail & Related papers (2020-12-08T10:14:29Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.