EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement
- URL: http://arxiv.org/abs/2502.14260v1
- Date: Thu, 20 Feb 2025 04:56:03 GMT
- Title: EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement
- Authors: Wenhui Zhu, Xuanzhao Dong, Xin Li, Yujian Xiong, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Zhangsihao Yang, Yi Su, Oana Dumitrascu, Yalin Wang,
- Abstract summary: generative models have achieved significant success in enhancement fundus images.
The evaluation of these models still presents a considerable challenge.
We propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs.
- Score: 14.724629346280402
- License:
- Abstract: Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical research (e.g., Vessel morphology consistency). 2) There is a lack of comprehensive evaluation for both paired and unpaired enhancement methods, along with the need for expert protocols to accurately assess clinical value. 3) An ideal evaluation system should provide insights to inform future developments of fundus image enhancement. To this end, we propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs, offering a foundation for future work to improve the clinical relevance and applicability of generative models for fundus image enhancement. EyeBench has three appealing properties: 1) multi-dimensional clinical alignment downstream evaluation: In addition to evaluating the enhancement task, we provide several clinically significant downstream tasks for fundus images, including vessel segmentation, DR grading, denoising generalization, and lesion segmentation. 2) Medical expert-guided evaluation design: We introduce a novel dataset that promote comprehensive and fair comparisons between paired and unpaired methods and includes a manual evaluation protocol by medical experts. 3) Valuable insights: Our benchmark study provides a comprehensive and rigorous evaluation of existing methods across different downstream tasks, assisting medical experts in making informed choices. Additionally, we offer further analysis of the challenges faced by existing methods. The code is available at \url{https://github.com/Retinal-Research/EyeBench}
Related papers
- The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison [0.6144680854063939]
Deep Learning approaches in dermatological image classification have shown promising results, yet the field faces significant methodological challenges that impede proper evaluation.
This paper presents a systematic analysis of current methodological practices in skin disease classification research, revealing substantial inconsistencies in data preparation, augmentation strategies, and performance reporting.
We propose comprehensive methodological recommendations for model development, evaluation, and clinical deployment, emphasizing rigorous data preparation, systematic error analysis, and specialized protocols for different image types.
arXiv Detail & Related papers (2025-02-04T17:15:36Z) - Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
This study focuses on the clinical evaluation of medical Synthetic Data Generation using Artificial Intelligence (AI) models.
The paper contributes by a) presenting a protocol for the systematic evaluation of synthetic images by medical experts and b) applying it to assess TIDE-II, a novel variational autoencoder-based model for high-resolution WCE image synthesis.
The results show that TIDE-II generates clinically relevant WCE images, helping to address data scarcity and enhance diagnostic tools.
arXiv Detail & Related papers (2024-10-31T19:48:50Z) - RIDGE: Reproducibility, Integrity, Dependability, Generalizability, and Efficiency Assessment of Medical Image Segmentation Models [1.4675465116143782]
This paper introduces the RIDGE checklist to assess the Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of deep learning-based medical image segmentation models.
The RIDGE checklist is not just a tool for evaluation but also a guideline for researchers striving to improve the quality and transparency of their work.
arXiv Detail & Related papers (2024-01-16T21:45:08Z) - Aligning Synthetic Medical Images with Clinical Knowledge using Human
Feedback [22.390670838355295]
This paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images.
We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.
arXiv Detail & Related papers (2023-06-16T21:54:20Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - A Knowledge-based Learning Framework for Self-supervised Pre-training
Towards Enhanced Recognition of Medical Images [14.304996977665212]
This study proposes a knowledge-based learning framework towards enhanced recognition of medical images.
It works in three phases by synergizing contrastive learning and generative learning models.
The proposed framework statistically excels in self-supervised benchmarks, achieving 2.08, 1.23, 1.12, 0.76 and 1.38 percentage points improvements over SimCLR in AUC/Dice.
arXiv Detail & Related papers (2022-11-27T03:58:58Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z) - Towards a Guideline for Evaluation Metrics in Medical Image Segmentation [0.0]
This work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary.
As a summary, we propose a guideline for standardized medical image segmentation evaluation.
arXiv Detail & Related papers (2022-02-10T13:38:05Z) - Deep AUC Maximization for Medical Image Classification: Challenges and
Opportunities [60.079782224958414]
We will present and discuss opportunities and challenges brought by a new deep learning method by AUC (aka underlinebf Deep underlinebf AUC classification)
arXiv Detail & Related papers (2021-11-01T15:31:32Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.