Towards Fine-grained Human Pose Transfer with Detail Replenishing
Network
- URL: http://arxiv.org/abs/2005.12494v2
- Date: Fri, 7 May 2021 04:39:39 GMT
- Title: Towards Fine-grained Human Pose Transfer with Detail Replenishing
Network
- Authors: Lingbo Yang, Pan Wang, Chang Liu, Zhanning Gao, Peiran Ren, Xinfeng
Zhang, Shanshe Wang, Siwei Ma, Xiansheng Hua, Wen Gao
- Abstract summary: Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality.
Existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency.
We develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment.
- Score: 96.54367984986898
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.
Related papers
- Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Deep Content Understanding Toward Entity and Aspect Target Sentiment Analysis on Foundation Models [0.8602553195689513]
Entity-Aspect Sentiment Triplet Extraction (EASTE) is a novel Aspect-Based Sentiment Analysis task.
Our research aims to achieve high performance on the EASTE task and investigates the impact of model size, type, and adaptation techniques on task performance.
Ultimately, we provide detailed insights and achieving state-of-the-art results in complex sentiment analysis.
arXiv Detail & Related papers (2024-07-04T16:48:14Z) - Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities [16.69453837626083]
We propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the Multimodal Sentiment Analysis (MSA) task under uncertain missing modalities.
We present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics.
We design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network.
arXiv Detail & Related papers (2024-04-25T09:35:09Z) - QUASAR: QUality and Aesthetics Scoring with Advanced Representations [20.194917729936357]
This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment.
We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data.
arXiv Detail & Related papers (2024-03-11T16:21:50Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation [20.676510832922016]
We propose a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes.
We introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations.
Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities.
arXiv Detail & Related papers (2024-02-20T01:48:33Z) - Human as Points: Explicit Point-based 3D Human Reconstruction from
Single-view RGB Images [78.56114271538061]
We introduce an explicit point-based human reconstruction framework called HaP.
Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space.
Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z) - Physics Inspired Hybrid Attention for SAR Target Recognition [61.01086031364307]
We propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the issues.
PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target.
Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters.
arXiv Detail & Related papers (2023-09-27T14:39:41Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors [13.977100716044104]
Image-based virtual try-on involves synthesizing convincing images of a model wearing a particular garment.
Recent methods involve a two stage process: i.) warping of the garment to align with the model ii.
The lack of geometric information about the model or the garment often results in improper rendering of granular details.
We propose ZFlow, an end-to-end framework, which seeks to alleviate these concerns.
arXiv Detail & Related papers (2021-09-14T22:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.