4KAgent: Agentic Any Image to 4K Super-Resolution
- URL: http://arxiv.org/abs/2507.07105v1
- Date: Wed, 09 Jul 2025 17:59:19 GMT
- Title: 4KAgent: Agentic Any Image to 4K Super-Resolution
- Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu,
- Abstract summary: We present 4KAgent, a super-resolution generalist system designed to upscale any image to 4K resolution (and even higher, if applied iteratively)<n>4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a quality-driven mixture-of-expert policy to select the optimal output for each step.<n>We rigorously evaluate our 4KAgent
- Score: 62.99433518118836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a recursive execution-reflection paradigm, guided by a quality-driven mixture-of-expert policy to select the optimal output for each step. Additionally, 4KAgent embeds a specialized face restoration pipeline, significantly enhancing facial details in portrait and selfie photos. We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics. By establishing a novel agentic paradigm for low-level vision tasks, we aim to catalyze broader interest and innovation within vision-centric autonomous agents across diverse research communities. We will release all the code, models, and results at: https://4kagent.github.io.
Related papers
- SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures [11.016055846317293]
SurgiSR4K is the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution.<n>This dataset opens up possibilities for a broad range of computer vision tasks that might benefit from high resolution data.
arXiv Detail & Related papers (2025-06-30T19:23:57Z) - Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields [56.184278668305076]
We introduce Feature4X, a universal framework to extend functionality from 2D vision foundation model into the 4D realm.<n>The framework is first to distill and lift the features of video foundation models into an explicit 4D feature field using Splatting.<n>Our experiments showcase novel view segment anything, geometric and appearance scene editing, and free-form VQA across all time steps.
arXiv Detail & Related papers (2025-03-26T17:56:16Z) - CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.<n>We propose a novel framework, Co-Spy, that first enhances existing semantic features.<n>We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z) - Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency [51.36674160287799]
We design a multi-branch deep neural network (DNN) to assess the quality of UHD images from three perspectives.
aesthetic features are extracted from low-resolution images downsampled from the UHD ones.
Technical distortions are measured using a fragment image composed of mini-patches cropped from UHD images.
The salient content of UHD images is detected and cropped to extract quality-aware features from the salient regions.
arXiv Detail & Related papers (2024-09-01T15:26:11Z) - HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization [21.1691961979094]
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process.
We propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs.
Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches.
arXiv Detail & Related papers (2024-07-03T17:49:31Z) - ClusterFormer: Clustering As A Universal Visual Learner [80.79669078819562]
CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER.
It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity.
For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
arXiv Detail & Related papers (2023-09-22T22:12:30Z) - 4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free
Images [12.402054374952485]
We develop a novel method to simulate 4K hazy images from clear images, which first estimates the scene depth, simulates the light rays and object reflectance, then migrates the synthetic images to real domains by using a GAN.
We wrap these synthesized images into a benchmark called the 4K-HAZE dataset.
The most appealing aspect of our approach is the capability to run a 4K image on a single GPU with 24G RAM in real-time (33fps)
arXiv Detail & Related papers (2023-03-28T09:39:29Z) - Towards Efficient and Scale-Robust Ultra-High-Definition Image
Demoireing [71.62289021118983]
We present an efficient baseline model ESDNet for tackling 4K moire images, wherein we build a semantic-aligned scale-aware module to address the scale variation of moire patterns.
Our approach outperforms state-of-the-art methods by a large margin while being much more lightweight.
arXiv Detail & Related papers (2022-07-20T14:20:52Z) - Deep Neural Network for Blind Visual Quality Assessment of 4K Content [37.70643043547502]
Existing blind image quality assessment (BIQA) methods are not suitable for the original and upscaled 4K contents.
We propose a deep learning-based BIQA model for 4K content, which on one hand can recognize true and pseudo 4K content and on the other hand can evaluate their perceptual visual quality.
The proposed model is trained through the multi-task learning manner and we introduce an uncertainty principle to balance the losses of the classification and regression tasks.
arXiv Detail & Related papers (2022-06-09T09:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.