Enhancing Human Pose Estimation in Ancient Vase Paintings via
Perceptually-grounded Style Transfer Learning
- URL: http://arxiv.org/abs/2012.05616v2
- Date: Sun, 25 Feb 2024 21:07:14 GMT
- Title: Enhancing Human Pose Estimation in Ancient Vase Paintings via
Perceptually-grounded Style Transfer Learning
- Authors: Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten
Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, Vincent Christlein
- Abstract summary: We show how to adapt a dataset of natural images of known person and pose annotations to the style of Greek vase paintings by means of image style-transfer.
We show that using style-transfer learning significantly improves the SOTA performance on unlabelled data by more than 6% mean average precision (mAP) and mean average recall (mAR)
In a thorough ablation study, we give a targeted analysis of the influence of style intensities, revealing that the model learns generic domain styles.
- Score: 15.888271913164969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human pose estimation (HPE) is a central part of understanding the visual
narration and body movements of characters depicted in artwork collections,
such as Greek vase paintings. Unfortunately, existing HPE methods do not
generalise well across domains resulting in poorly recognized poses. Therefore,
we propose a two step approach: (1) adapting a dataset of natural images of
known person and pose annotations to the style of Greek vase paintings by means
of image style-transfer. We introduce a perceptually-grounded style transfer
training to enforce perceptual consistency. Then, we fine-tune the base model
with this newly created dataset. We show that using style-transfer learning
significantly improves the SOTA performance on unlabelled data by more than 6%
mean average precision (mAP) as well as mean average recall (mAR). (2) To
improve the already strong results further, we created a small dataset
(ClassArch) consisting of ancient Greek vase paintings from the 6-5th century
BCE with person and pose annotations. We show that fine-tuning on this data
with a style-transferred model improves the performance further. In a thorough
ablation study, we give a targeted analysis of the influence of style
intensities, revealing that the model learns generic domain styles.
Additionally, we provide a pose-based image retrieval to demonstrate the
effectiveness of our method.
Related papers
- PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework.
We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences.
Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z) - GRPose: Learning Graph Relations for Human Image Generation with Pose Priors [21.91374799527015]
We propose a framework that delves into the graph relations of pose priors to provide control information for human image generation.
The main idea is to establish a graph topological structure between the pose priors and latent representation of diffusion models.
A pose perception loss is introduced based on a pretrained pose estimation network to minimize the pose differences.
arXiv Detail & Related papers (2024-08-29T13:58:34Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot
Learning [89.86971464234533]
Cross-Domain Few-Shot Learning (CD-FSL) is a recently emerging task that tackles few-shot learning across different domains.
We propose a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method.
Our method is gradually robust to the visual styles, thus boosting the generalization ability for novel target datasets.
arXiv Detail & Related papers (2023-02-18T11:54:37Z) - Adversarial Style Augmentation for Domain Generalized Urban-Scene
Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training.
Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z) - Semi-supervised Human Pose Estimation in Art-historical Images [9.633949256082763]
We propose a novel approach to estimate human poses in art-language images.
Our approach achieves significantly better results than methods that use pre-trained models or style transfer.
arXiv Detail & Related papers (2022-07-06T21:20:58Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - What Can Style Transfer and Paintings Do For Model Robustness? [12.543035508615896]
A common strategy for improving model robustness is through data augmentations.
Recent work has shown that arbitrary style transfer can be used as a form of data augmentation.
We show that learning from paintings as a form of perceptual data augmentation can improve model robustness.
arXiv Detail & Related papers (2020-11-30T00:25:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.