Related papers: Semantic Style Transfer for Enhancing Animal Facial Landmark Detection

Semantic Style Transfer for Enhancing Animal Facial Landmark Detection

URL: http://arxiv.org/abs/2505.05640v1
Date: Thu, 08 May 2025 20:48:15 GMT
Title: Semantic Style Transfer for Enhancing Animal Facial Landmark Detection
Authors: Anadil Hussein, Anna Zamansky, George Martvel,
Abstract summary: Style transfer is a technique for applying the visual characteristics of one image onto another while preserving structural content.<n>This study investigates the use of this technique for enhancing animal facial landmark detectors training.<n>Applying style transfer to cropped facial images rather than full-body images enhances structural consistency.<n>Supervised Style Transfer (SST) - which selects style sources based on landmark accuracy - retained up to 98% of baseline accuracy.
Score: 0.3186130813218338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural Style Transfer (NST) is a technique for applying the visual characteristics of one image onto another while preserving structural content. Traditionally used for artistic transformations, NST has recently been adapted, e.g., for domain adaptation and data augmentation. This study investigates the use of this technique for enhancing animal facial landmark detectors training. As a case study, we use a recently introduced Ensemble Landmark Detector for 48 anatomical cat facial landmarks and the CatFLW dataset it was trained on, making three main contributions. First, we demonstrate that applying style transfer to cropped facial images rather than full-body images enhances structural consistency, improving the quality of generated images. Secondly, replacing training images with style-transferred versions raised challenges of annotation misalignment, but Supervised Style Transfer (SST) - which selects style sources based on landmark accuracy - retained up to 98% of baseline accuracy. Finally, augmenting the dataset with style-transferred images further improved robustness, outperforming traditional augmentation methods. These findings establish semantic style transfer as an effective augmentation strategy for enhancing the performance of facial landmark detection models for animals and beyond. While this study focuses on cat facial landmarks, the proposed method can be generalized to other species and landmark detection models.

Related papers

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer [1.2554129265335305]
We present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet dataset.<n>We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition.
arXiv Detail & Related papers (2026-02-07T13:13:47Z)
Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization [1.8747639074211104]
Stylizing ViT is a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention.<n>We show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation.
arXiv Detail & Related papers (2026-01-24T20:53:02Z)
Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks [4.2875024530011085]
Recent research has investigated the shape and texture biases of deep neural networks (DNNs) in image classification.<n>We show that training with stylized images reduces texture biases in image classification and improves robustness with respect to image corruptions.<n>In our experiments, it turns out that in semantic segmentation, style transfer augmentation reduces texture bias and strongly increases robustness with respect to common image corruptions as well as adversarial attacks.
arXiv Detail & Related papers (2025-07-14T13:02:19Z)
Optimal-Landmark-Guided Image Blending for Face Morphing Attacks [8.024953195407502]
We propose a novel approach for conducting face morphing attacks, which utilizes optimal-landmark-guided image blending. Our proposed method overcomes the limitations of previous approaches by optimizing the morphing landmarks and using Graph Convolutional Networks (GCNs) to combine landmark and appearance features.
arXiv Detail & Related papers (2024-01-30T03:45:06Z)
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer [27.39248034592382]
We propose using a new class of models to perform style transfer while enabling deformable style transfer. We show how leveraging the priors of these models can expose new artistic controls at inference time.
arXiv Detail & Related papers (2023-07-09T12:13:43Z)
Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition [62.997667081978825]
We propose a biologically-inspired mechanism for transfer learning in facial expression recognition. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes. Our model achieves a classification accuracy of 92.15% on the FERG dataset with extreme data efficiency.
arXiv Detail & Related papers (2023-04-05T09:06:30Z)
MorphGANFormer: Transformer-based Face Morphing and De-Morphing [55.211984079735196]
StyleGAN-based approaches to face morphing are among the leading techniques. We propose a transformer-based alternative to face morphing and demonstrate its superiority to StyleGAN-based methods.
arXiv Detail & Related papers (2023-02-18T19:09:11Z)
Fine-Grained Image Style Transfer with Visual Transformers [59.85619519384446]
We propose a novel STyle TRansformer (STTR) network which breaks both content and style images into visual tokens to achieve a fine-grained style transformation. To compare STTR with existing approaches, we conduct user studies on Amazon Mechanical Turk.
arXiv Detail & Related papers (2022-10-11T06:26:00Z)
Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection [0.0]
We use GAN-based data augmentation to generate extra dataset instances. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model.
arXiv Detail & Related papers (2021-08-28T06:32:42Z)
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes. We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Learning domain-agnostic visual representation for computational pathology using medically-irrelevant style transfer augmentation [4.538771844947821]
STRAP (Style TRansfer Augmentation for histoPathology) is a form of data augmentation based on random style transfer from artistic paintings. Style transfer replaces the low-level texture content of images with the uninformative style of randomly selected artistic paintings. We demonstrate that STRAP leads to state-of-the-art performance, particularly in the presence of domain shifts.
arXiv Detail & Related papers (2021-02-02T18:50:16Z)
Combining Deep Learning with Geometric Features for Image based Localization in the Gastrointestinal Tract [8.510792628268824]
We propose a novel approach to combine Deep Learning method with traditional feature based approach to achieve better localization with small training data. Our method fully exploits the best of both worlds by introducing a Siamese network structure to perform few-shot classification to the closest zone in the segmented training image set. The accuracy is improved by 28.94% (Position) and 10.97% (Orientation) with respect to state-of-art method.
arXiv Detail & Related papers (2020-05-11T23:04:00Z)
Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes. We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans. We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.