Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions
- URL: http://arxiv.org/abs/2506.23547v1
- Date: Mon, 30 Jun 2025 06:36:11 GMT
- Title: Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions
- Authors: Jiwon Kim, Soohyun Hwang, Dong-O Kim, Changsu Han, Min Kyu Park, Chang-Su Kim,
- Abstract summary: The first algorithm, called Oneta, for a novel task of multi-style image enhancement is proposed.<n>Oneta uses two point operators sequentially: intensity enhancement with a transformation function (TF) and color correction with a color correction matrix ( CCM)
- Score: 20.459710796298722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The first algorithm, called Oneta, for a novel task of multi-style image enhancement is proposed in this work. Oneta uses two point operators sequentially: intensity enhancement with a transformation function (TF) and color correction with a color correction matrix (CCM). This two-step enhancement model, though simple, achieves a high performance upper bound. Also, we introduce eigentransformation function (eigenTF) to represent TF compactly. The Oneta network comprises Y-Net and C-Net to predict eigenTF and CCM parameters, respectively. To support $K$ styles, Oneta employs $K$ learnable tokens. During training, each style token is learned using image pairs from the corresponding dataset. In testing, Oneta selects one of the $K$ style tokens to enhance an image accordingly. Extensive experiments show that the single Oneta network can effectively undertake six enhancement tasks -- retouching, image signal processing, low-light image enhancement, dehazing, underwater image enhancement, and white balancing -- across 30 datasets.
Related papers
- Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning [69.33115351856785]
We present a novel method, called T2I-PAL, to tackle the modality gap issue when using only text captions for PEFT.<n>The core design of T2I-PAL is to leverage pre-trained text-to-image generation models to generate photo-realistic and diverse images from text captions.<n>Extensive experiments on multiple benchmarks, including MS-COCO, VOC2007, and NUS-WIDE, show that our T2I-PAL can boost recognition performance by 3.47% in average.
arXiv Detail & Related papers (2025-06-12T11:09:49Z) - RefineStyle: Dynamic Convolution Refinement for StyleGAN [15.230430037135017]
In StyleGAN, convolution kernels are shaped by both static parameters shared across images.
$mathcalW+$ space is often used for image inversion and editing.
This paper proposes an efficient refining strategy for dynamic kernels.
arXiv Detail & Related papers (2024-10-08T15:01:30Z) - Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion [23.62010759076202]
We formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels.
Our proposed PAR algorithm only adjusts 0.75% learnable parameters compared with the fine-tuning strategy.
arXiv Detail & Related papers (2023-12-17T11:59:14Z) - MIMT: Multi-Illuminant Color Constancy via Multi-Task Local Surface and
Light Color Learning [42.72878256074646]
We introduce a multi-task learning method to discount multiple light colors in a single input image.
To have better cues of the local surface/light colors under multiple light color conditions, we design a novel multi-task learning framework.
Our model achieves 47.1% improvement compared to a state-of-the-art multi-illuminant color constancy method on a multi-illuminant dataset.
arXiv Detail & Related papers (2022-11-16T09:00:20Z) - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [95.02406834386814]
Parti treats text-to-image generation as a sequence-to-sequence modeling problem.
Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens.
PartiPrompts (P2) is a new holistic benchmark of over 1600 English prompts.
arXiv Detail & Related papers (2022-06-22T01:11:29Z) - Controllable Image Enhancement [66.18525728881711]
We present a semiautomatic image enhancement algorithm that can generate high-quality images with multiple styles by controlling a few parameters.
An encoder-decoder framework encodes the retouching skills into latent codes and decodes them into the parameters of image signal processing functions.
arXiv Detail & Related papers (2022-06-16T23:54:53Z) - SCSNet: An Efficient Paradigm for Learning Simultaneously Image
Colorization and Super-Resolution [39.77987463287673]
We present an efficient paradigm to perform Simultaneously Image Colorization and Super-resolution (SCS)
The proposed method consists of two parts: colorization branch for learning color information that employs the proposed plug-and-play emphPyramid Valve Cross Attention (PVCAttn) module.
Our SCSNet supports both automatic and referential modes that is more flexible for practical application.
arXiv Detail & Related papers (2022-01-12T08:59:12Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification [17.709880544501758]
We propose a dual-branch transformer to combine image patches of different sizes to produce stronger image features.
Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity.
Our proposed cross-attention only requires linear time for both computational and memory complexity instead of quadratic time otherwise.
arXiv Detail & Related papers (2021-03-27T13:03:17Z) - Tokens-to-Token ViT: Training Vision Transformers from Scratch on
ImageNet [128.96032932640364]
We propose a new Tokens-To-Token Vision Transformers (T2T-ViT) to solve vision tasks.
T2T-ViT reduces the parameter counts and MACs of vanilla ViT by 200%, while achieving more than 2.5% improvement when trained from scratch on ImageNet.
For example, T2T-ViT with ResNet50 comparable size can achieve 80.7% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2021-01-28T13:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.