A Systematic Review of Data-to-Text NLG
- URL: http://arxiv.org/abs/2402.08496v3
- Date: Tue, 27 Feb 2024 00:05:28 GMT
- Title: A Systematic Review of Data-to-Text NLG
- Authors: Chinonso Cynthia Osuji, Thiago Castro Ferreira, Brian Davis
- Abstract summary: Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation.
Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
- Score: 2.4769539696439677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This systematic review undertakes a comprehensive analysis of current
research on data-to-text generation, identifying gaps, challenges, and future
directions within the field. Relevant literature in this field on datasets,
evaluation metrics, application areas, multilingualism, language models, and
hallucination mitigation methods is reviewed. Various methods for producing
high-quality text are explored, addressing the challenge of hallucinations in
data-to-text generation. These methods include re-ranking, traditional and
neural pipeline architecture, planning architectures, data cleaning, controlled
generation, and modification of models and training techniques. Their
effectiveness and limitations are assessed, highlighting the need for
universally applicable strategies to mitigate hallucinations. The review also
examines the usage, popularity, and impact of datasets, alongside evaluation
metrics, with an emphasis on both automatic and human assessment. Additionally,
the evolution of data-to-text models, particularly the widespread adoption of
transformer models, is discussed. Despite advancements in text quality, the
review emphasizes the importance of research in low-resourced languages and the
engineering of datasets in these languages to promote inclusivity. Finally,
several application domains of data-to-text are highlighted, emphasizing their
relevance in such domains. Overall, this review serves as a guiding framework
for fostering innovation and advancing data-to-text generation.
Related papers
- Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - A Survey on Recent Advances in Conversational Data Generation [14.237954885530396]
We offer a systematic and comprehensive review of multi-turn conversational data generation.
We focus on three types of dialogue systems: open domain, task-oriented, and information-seeking.
We examine the evaluation metrics and methods for assessing synthetic conversational data.
arXiv Detail & Related papers (2024-05-12T10:11:12Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - Controllable Data Generation by Deep Learning: A Review [22.582082771890974]
controllable deep data generation is a promising research area, commonly known as controllable deep data generation.
This article introduces exciting applications of controllable deep data generation, experimentally analyzes and compares existing works.
arXiv Detail & Related papers (2022-07-19T20:44:42Z) - Faithfulness in Natural Language Generation: A Systematic Survey of
Analysis, Evaluation and Optimization Methods [48.47413103662829]
Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models.
However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge.
arXiv Detail & Related papers (2022-03-10T08:28:32Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.