Does resistance to style-transfer equal Global Shape Bias? Measuring
network sensitivity to global shape configuration
- URL: http://arxiv.org/abs/2310.07555v3
- Date: Thu, 29 Feb 2024 15:53:39 GMT
- Title: Does resistance to style-transfer equal Global Shape Bias? Measuring
network sensitivity to global shape configuration
- Authors: Ziqi Wen, Tianqin Li, Zhi Jing, Tai Sing Lee
- Abstract summary: Current benchmark for evaluating a model's global shape bias is a set of style-transferred images.
We show that networks trained with style-transfer images indeed learn to ignore style, but its shape bias arises primarily from local detail.
- Score: 6.047146237332764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are known to exhibit a strong texture bias, while human
tends to rely heavily on global shape structure for object recognition. The
current benchmark for evaluating a model's global shape bias is a set of
style-transferred images with the assumption that resistance to the attack of
style transfer is related to the development of global structure sensitivity in
the model. In this work, we show that networks trained with style-transfer
images indeed learn to ignore style, but its shape bias arises primarily from
local detail. We provide a \textbf{Disrupted Structure Testbench (DiST)} as a
direct measurement of global structure sensitivity. Our test includes 2400
original images from ImageNet-1K, each of which is accompanied by two images
with the global shapes of the original image disrupted while preserving its
texture via the texture synthesis program. We found that \textcolor{black}{(1)
models that performed well on the previous cue-conflict dataset do not fare
well in the proposed DiST; (2) the supervised trained Vision Transformer (ViT)
lose its global spatial information from positional embedding, leading to no
significant advantages over Convolutional Neural Networks (CNNs) on DiST. While
self-supervised learning methods, especially mask autoencoder significantly
improves the global structure sensitivity of ViT. (3) Improving the global
structure sensitivity is orthogonal to resistance to style-transfer, indicating
that the relationship between global shape structure and local texture detail
is not an either/or relationship. Training with DiST images and
style-transferred images are complementary, and can be combined to train
network together to enhance the global shape sensitivity and robustness of
local features.} Our code will be hosted in github:
https://github.com/leelabcnbc/DiST
Related papers
- T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image [84.08705684778666]
We propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M.
Specifically, we use a global Transformer to control the holistic shape and a local Transformer to refine the local geometry details.
Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
arXiv Detail & Related papers (2024-03-20T15:14:22Z) - DeblurDiNAT: A Generalizable Transformer for Perceptual Image Deblurring [1.5124439914522694]
DeblurDiNAT is a generalizable and efficient encoder-decoder Transformer which restores clean images visually close to the ground truth.
We present a linear feed-forward network and a non-linear dual-stage feature fusion module for faster feature propagation across the network.
arXiv Detail & Related papers (2024-03-19T21:31:31Z) - Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images [25.82631308991067]
We introduce the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates localized editing of style attributes of several Regions of Interest in face images.
The L2SAE learns separate latent representations for encoded images' structure and style information, allowing for structure-preserving style editing of the chosen ROIs.
We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.
arXiv Detail & Related papers (2023-12-22T20:06:53Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Spectral Normalization and Dual Contrastive Regularization for
Image-to-Image Translation [9.029227024451506]
We propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization.
We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.
arXiv Detail & Related papers (2023-04-22T05:22:24Z) - Arbitrary Style Transfer with Structure Enhancement by Combining the
Global and Local Loss [51.309905690367835]
We introduce a novel arbitrary style transfer method with structure enhancement by combining the global and local loss.
Experimental results demonstrate that our method can generate higher-quality images with impressive visual effects.
arXiv Detail & Related papers (2022-07-23T07:02:57Z) - Low Light Image Enhancement via Global and Local Context Modeling [164.85287246243956]
We introduce a context-aware deep network for low-light image enhancement.
First, it features a global context module that models spatial correlations to find complementary cues over full spatial domain.
Second, it introduces a dense residual block that captures local context with a relatively large receptive field.
arXiv Detail & Related papers (2021-01-04T09:40:54Z) - Informative Dropout for Robust Representation Learning: A Shape-bias
Perspective [84.30946377024297]
We propose a light-weight model-agnostic method, namely Informative Dropout (InfoDrop), to improve interpretability and reduce texture bias.
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
arXiv Detail & Related papers (2020-08-10T16:52:24Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.