Related papers: A Preliminary Study for GPT-4o on Image Restoration

A Preliminary Study for GPT-4o on Image Restoration

URL: http://arxiv.org/abs/2505.05621v2
Date: Sat, 17 May 2025 20:18:19 GMT
Title: A Preliminary Study for GPT-4o on Image Restoration
Authors: Hao Yang, Yan Yang, Ruikun Zhang, Liyuan Pan,
Abstract summary: OpenAI's GPT-4o model has demonstrated unprecedented performance in image generation.<n>We present the first systematic evaluation of GPT-4o across diverse restoration tasks.
Score: 7.784948465884567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: OpenAI's GPT-4o model, integrating multi-modal inputs and outputs within an autoregressive architecture, has demonstrated unprecedented performance in image generation. In this work, we investigate its potential impact on the image restoration community. We present the first systematic evaluation of GPT-4o across diverse restoration tasks. Our experiments reveal that, although restoration outputs from GPT-4o are visually appealing, they often suffer from pixel-level structural fidelity when compared to ground-truth images. Common issues are variations in image proportions, shifts in object positions and quantities, and changes in viewpoint. To address it, taking image dehazing, derainning, and low-light enhancement as representative case studies, we show that GPT-4o's outputs can serve as powerful visual priors, substantially enhancing the performance of existing dehazing networks. It offers practical guidelines and a baseline framework to facilitate the integration of GPT-4o into future image restoration pipelines. We hope the study on GPT-4o image restoration will accelerate innovation in the broader field of image generation areas. To support further research, we will release GPT-4o-restored images.

Related papers

Can ChatGPT Perform Image Splicing Detection? A Preliminary Study [0.0]
Multimodal Large Language Models (MLLMs) like GPT-4V are capable of reasoning across text and image modalities.<n>We evaluate GPT-4V using three prompting strategies: Zero-Shot (ZS), Few-Shot (FS), and Chain-of-Thought (CoT)<n>Our results show that GPT-4V achieves competitive detection performance in zero-shot settings (more than 85% accuracy)
arXiv Detail & Related papers (2025-05-22T13:53:53Z)
Preliminary Explorations with GPT-4o(mni) Native Image Generation [7.700772640399941]
Recently, the visual generation ability by GPT-4o(mni) has been unlocked by OpenAI.<n>In this paper, we aim to explore the capabilities of GPT-4o across various tasks.
arXiv Detail & Related papers (2025-05-06T19:35:29Z)
An Empirical Study of GPT-4o Image Generation Capabilities [40.86026243294732]
We conduct an empirical study of GPT-4o's image generation capabilities, benchmarking it against leading open-source and commercial models.<n>Our analysis highlights the strengths and limitations of GPT-4o under various settings, and situates it within the broader evolution of generative modeling.
arXiv Detail & Related papers (2025-04-08T12:34:36Z)
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation [28.235805447825896]
OpenAI's GPT4o model has demonstrated surprisingly good capabilities in image generation and editing.<n>This report presents the first-look evaluation benchmark (named GPT-ImgEval)<n>We show GPT-4o's performance across three critical dimensions: generation quality, (2) editing proficiency, and (3) world knowledge-informed synthesis.
arXiv Detail & Related papers (2025-04-03T17:23:16Z)
Boosting Image Restoration via Priors from Pre-trained Models [54.83907596825985]
We learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF. PTG-RM effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.
arXiv Detail & Related papers (2024-03-11T15:11:57Z)
Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding [114.4754255143887]
We tackle the challenge of classifying the object category in point clouds. We employ GPT-4 Vision (GPT-4V) to overcome these challenges. We set a new benchmark in zero-shot point cloud classification.
arXiv Detail & Related papers (2024-01-15T10:16:44Z)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks. We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds. Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z)
Holistic Evaluation of GPT-4V for Biomedical Imaging [113.46226609088194]
GPT-4V represents a breakthrough in artificial general intelligence for computer vision. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization.
arXiv Detail & Related papers (2023-11-10T18:40:44Z)
An Early Evaluation of GPT-4V(ision) [40.866323649060696]
We evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio. To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V.
arXiv Detail & Related papers (2023-10-25T10:33:17Z)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs. GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system. GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.