SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised
Image-to-Image Translation
- URL: http://arxiv.org/abs/2103.16219v1
- Date: Tue, 30 Mar 2021 10:03:07 GMT
- Title: SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised
Image-to-Image Translation
- Authors: Xuning Shao, Weidong Zhang
- Abstract summary: For unsupervised image-to-image translation, we propose a discriminator architecture which focuses on the statistical features instead of individual patches.
We show that the proposed method outperforms the existing state-of-the-art models in various challenging applications including selfie-to-anime, male-to-female and glasses removal.
- Score: 4.466402706561989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For unsupervised image-to-image translation, we propose a discriminator
architecture which focuses on the statistical features instead of individual
patches. The network is stabilized by distribution matching of key statistical
features at multiple scales. Unlike the existing methods which impose more and
more constraints on the generator, our method facilitates the shape deformation
and enhances the fine details with a greatly simplified framework. We show that
the proposed method outperforms the existing state-of-the-art models in various
challenging applications including selfie-to-anime, male-to-female and glasses
removal. The code will be made publicly available.
Related papers
- Diversity over Uniformity: Rethinking Representation in Generated Image Detection [22.020742109848317]
We argue that reliably generated image detection should not depend on a single decision path but should preserve multiple judgment perspectives.<n>We propose an anti-feature-collapse learning framework that filters task-irrelevant components and suppresses excessive overlap among different forgery cues in the representation space.<n>This design maintains diverse and complementary evidence within the model, reduces reliance on a small set of salient cues, and enhances robustness under unseen generative settings.
arXiv Detail & Related papers (2026-02-28T15:42:12Z) - GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation [51.95701097588426]
We introduce a Global Perspective Tokenizer (GloTok) to model a more uniform semantic distribution of tokenized features.<n>A residual learning module is proposed to recover the fine-grained details to minimize the reconstruction error caused by quantization.<n>Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.
arXiv Detail & Related papers (2025-11-18T06:40:26Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - CLIP Adaptation by Intra-modal Overlap Reduction [1.2277343096128712]
We analyse the intra-modal overlap in image space in terms of embedding representation.
We train a lightweight adapter on a generic set of samples from the Google Open Images dataset.
arXiv Detail & Related papers (2024-09-17T16:40:58Z) - Unsupervised Representation Learning by Balanced Self Attention Matching [2.3020018305241337]
We present a self-supervised method for embedding image features called BAM.
We obtain rich representations and avoid feature collapse by minimizing a loss that matches these distributions to their globally balanced and entropy regularized version.
We show competitive performance with leading methods on both semi-supervised and transfer-learning benchmarks.
arXiv Detail & Related papers (2024-08-04T12:52:44Z) - Self-similarity Driven Scale-invariant Learning for Weakly Supervised
Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL)
We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features.
Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - Understanding Gender and Racial Disparities in Image Recognition Models [0.0]
We investigate a multi-label softmax loss with cross-entropy as the loss function instead of a binary cross-entropy on a multi-label classification problem.
We use the MR2 dataset to evaluate the fairness in the model outcomes and try to interpret the mistakes by looking at model activations and suggest possible fixes.
arXiv Detail & Related papers (2021-07-20T01:05:31Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z) - TSIT: A Simple and Versatile Framework for Image-to-Image Translation [103.92203013154403]
We introduce a simple and versatile framework for image-to-image translation.
We provide a carefully designed two-stream generative model with newly proposed feature transformations.
This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network.
A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
arXiv Detail & Related papers (2020-07-23T15:34:06Z) - Transformation Consistency Regularization- A Semi-Supervised Paradigm
for Image-to-Image Translation [18.870983535180457]
We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation.
We evaluate the efficacy of our algorithm on three different applications: image colorization, denoising and super-resolution.
Our method is significantly data efficient, requiring only around 10 - 20% of labeled samples to achieve similar image reconstructions to its fully-supervised counterpart.
arXiv Detail & Related papers (2020-07-15T17:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.