Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization
- URL: http://arxiv.org/abs/2601.17586v1
- Date: Sat, 24 Jan 2026 20:53:02 GMT
- Title: Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization
- Authors: Sebastian Doerrich, Francesco Di Salvo, Jonas Alle, Christian Ledig,
- Abstract summary: Stylizing ViT is a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention.<n>We show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation.
- Score: 1.8747639074211104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct image classification tasks in the context of histopathology and dermatology. Results demonstrate an improved robustness (up to +13% accuracy) over the state of the art while generating perceptually convincing images without artifacts. Additionally, we show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation. The source code is available at https://github.com/sdoerrich97/stylizing-vit .
Related papers
- GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing [60.101097709212716]
This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach.<n>Our technique leverages image editing to generate augmented images based on custom conditional prompts.<n>Our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models.
arXiv Detail & Related papers (2024-12-03T10:45:34Z) - Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization [0.13108652488669734]
We propose a novel generative method for domain generalization in histopathology images.
Our method employs a generative, self-supervised Vision Transformer to dynamically extract characteristics of image patches.
Experiments conducted on two distinct histopathology datasets demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-07-03T08:20:27Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation [53.24011398381715]
We introduce a Plug-and-Play module for data augmentation called MoreStyle.
MoreStyle diversifies image styles by relaxing low-frequency constraints in Fourier space.
With the help of adversarial learning, MoreStyle pinpoints the most intricate style combinations within latent features.
arXiv Detail & Related papers (2024-03-18T11:38:47Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Dual Contrastive Loss and Attention for GANs [82.713118646294]
We propose a novel dual contrastive loss and show that, with this loss, discriminator learns more generalized and distinguishable representations to incentivize generation.
We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models.
By combining the strengths of these remedies, we improve the compelling state-of-the-art Fr'echet Inception Distance (FID) by at least 17.5% on several benchmark datasets.
arXiv Detail & Related papers (2021-03-31T01:10:26Z) - Learning domain-agnostic visual representation for computational
pathology using medically-irrelevant style transfer augmentation [4.538771844947821]
STRAP (Style TRansfer Augmentation for histoPathology) is a form of data augmentation based on random style transfer from artistic paintings.
Style transfer replaces the low-level texture content of images with the uninformative style of randomly selected artistic paintings.
We demonstrate that STRAP leads to state-of-the-art performance, particularly in the presence of domain shifts.
arXiv Detail & Related papers (2021-02-02T18:50:16Z) - Style-invariant Cardiac Image Segmentation with Test-time Augmentation [10.234493507401618]
Deep models often suffer from severe performance drop due to the appearance shift in the real clinical setting.
In this paper, we propose a novel style-invariant method for cardiac image segmentation.
arXiv Detail & Related papers (2020-09-24T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.