Related papers: A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

URL: http://arxiv.org/abs/2506.10019v1
Date: Fri, 06 Jun 2025 11:09:46 GMT
Title: A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations
Authors: Tian Lan, Yang-Hao Zhou, Zi-Ao Ma, Fanshu Sun, Rui-Qing Sun, Junyu Luo, Rong-Cheng Tu, Heyan Huang, Chen Xu, Zhijing Wu, Xian-Ling Mao,
Abstract summary: We present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities.<n>Our analysis begins by examining evaluation methods for text generation, where techniques are most mature.<n>We then extend this framework to image and audio generation, demonstrating its broad applicability.
Score: 58.105900601078595
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.

Related papers

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning [63.531262595858]
Divide-and-conquer approach breaks comprehensive evaluation task into localized scoring tasks, followed by a final global assessment.<n>We introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations.<n>Finally, we develop an uncertainty-based active learning algorithm that efficiently selects data samples for human annotation.
arXiv Detail & Related papers (2025-05-26T16:39:41Z)
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework [0.1979158763744267]
Open-ended text generation has become a prominent task in natural language processing.<n> evaluating the quality of these models and the employed decoding strategies remains challenging.<n>This paper proposes novel methods for both relative and absolute rankings of decoding methods.
arXiv Detail & Related papers (2024-10-24T11:32:01Z)
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks. We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual. We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z)
Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model [2.6320841968362645]
This work delved into the realm of automatic text generation, exploring a variety of techniques ranging from traditional deterministic approaches to more modern methods. Through analysis of greedy search, beam search, top-k sampling, top-p sampling, contrastive searching, and locally typical searching, this work has provided valuable insights into the strengths, weaknesses, and potential applications of each method.
arXiv Detail & Related papers (2024-04-02T09:49:53Z)
A Systematic Review of Data-to-Text NLG [2.4769539696439677]
Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
arXiv Detail & Related papers (2024-02-13T14:51:45Z)
Automatic assessment of text-based responses in post-secondary education: A systematic review [0.0]
There is immense potential to automate rapid assessment and feedback of text-based responses in education. To understand how text-based automatic assessment systems have been developed and applied in education in recent years, three research questions are considered. This systematic review provides an overview of recent educational applications of text-based assessment systems.
arXiv Detail & Related papers (2023-08-30T17:16:45Z)
Automated Audio Captioning: an Overview of Recent Progress and New Challenges [56.98522404673527]
Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. We present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.
arXiv Detail & Related papers (2022-05-12T08:36:35Z)
How to Evaluate Your Dialogue Models: A Review of Approaches [2.7834038784275403]
We are first to divide the evaluation methods into three classes, i.e., automatic evaluation, human-involved evaluation and user simulator based evaluation. The existence of benchmarks, suitable for the evaluation of dialogue techniques are also discussed in detail.
arXiv Detail & Related papers (2021-08-03T08:52:33Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems. Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective. We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.