Related papers: Evaluation in Neural Style Transfer: A Review

Evaluation in Neural Style Transfer: A Review

URL: http://arxiv.org/abs/2401.17109v1
Date: Tue, 30 Jan 2024 15:45:30 GMT
Title: Evaluation in Neural Style Transfer: A Review
Authors: Eleftherios Ioannou and Steve Maddock
Abstract summary: We provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons but will also enhance the comprehension and interpretation of research findings in the field.
Score: 0.7614628596146599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The field of Neural Style Transfer (NST) has witnessed remarkable progress in the past few years, with approaches being able to synthesize artistic and photorealistic images and videos of exceptional quality. To evaluate such results, a diverse landscape of evaluation methods and metrics is used, including authors' opinions based on side-by-side comparisons, human evaluation studies that quantify the subjective judgements of participants, and a multitude of quantitative computational metrics which objectively assess the different aspects of an algorithm's performance. However, there is no consensus regarding the most suitable and effective evaluation procedure that can guarantee the reliability of the results. In this review, we provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons among NST methods but will also enhance the comprehension and interpretation of research findings in the field.

Related papers

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning [63.531262595858]
Divide-and-conquer approach breaks comprehensive evaluation task into localized scoring tasks, followed by a final global assessment.<n>We introduce a hybrid in-context learning approach that leverages human annotations to enhance the performance of both local and global evaluations.<n>Finally, we develop an uncertainty-based active learning algorithm that efficiently selects data samples for human annotation.
arXiv Detail & Related papers (2025-05-26T16:39:41Z)
A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability [36.83105355430611]
We propose a dual-perspective NLG meta-evaluation framework that focuses on different evaluation capabilities. We also introduce a method of automatically constructing the corresponding benchmarks without requiring new human annotations.
arXiv Detail & Related papers (2025-02-17T17:22:49Z)
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms. Nearly 1,200 teams from across the world participated. We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z)
Hierarchical Evaluation Framework: Best Practices for Human Evaluation [17.91641890651225]
The absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. We develop our own hierarchical evaluation framework to provide a more comprehensive representation of the NLP system's performance. In future work, we will investigate the potential time-saving benefits of our proposed framework for evaluators assessing NLP systems.
arXiv Detail & Related papers (2023-10-03T09:46:02Z)
Towards a Comprehensive Human-Centred Evaluation Framework for Explainable AI [1.7222662622390634]
We propose to adapt the User-Centric Evaluation Framework used in recommender systems. We integrate explanation aspects, summarise explanation properties, indicate relations between them, and categorise metrics that measure these properties.
arXiv Detail & Related papers (2023-07-31T09:20:16Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z)
On The Coherence of Quantitative Evaluation of Visual Explanations [0.7212939068975619]
Evaluation methods have been proposed to assess the "goodness" of visual explanations. We study a subset of the ImageNet-1k validation set where we evaluate a number of different commonly-used explanation methods. Results of our study suggest that there is a lack of coherency on the grading provided by some of the considered evaluation methods.
arXiv Detail & Related papers (2023-02-14T13:41:57Z)
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale. We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units. We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z)
Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z)
Counterfactually Evaluating Explanations in Recommender Systems [14.938252589829673]
We propose an offline evaluation method that can be computed without human involvement. We show that, compared to conventional methods, our method can produce evaluation scores more correlated with the real human judgments.
arXiv Detail & Related papers (2022-03-02T18:55:29Z)
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.