Self-supervised Correlation Mining Network for Person Image Generation
- URL: http://arxiv.org/abs/2111.13307v2
- Date: Mon, 29 Nov 2021 08:25:03 GMT
- Title: Self-supervised Correlation Mining Network for Person Image Generation
- Authors: Zijian Wang, Xingqun Qi, Kun Yuan, Muyi Sun
- Abstract summary: Person image generation aims to perform non-rigid deformation on source images.
We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space.
For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
- Score: 9.505343361614928
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Person image generation aims to perform non-rigid deformation on source
images, which generally requires unaligned data pairs for training. Recently,
self-supervised methods express great prospects in this task by merging the
disentangled representations for self-reconstruction. However, such methods
fail to exploit the spatial correlation between the disentangled features. In
this paper, we propose a Self-supervised Correlation Mining Network (SCM-Net)
to rearrange the source images in the feature space, in which two collaborative
modules are integrated, Decomposed Style Encoder (DSE) and Correlation Mining
Module (CMM). Specifically, the DSE first creates unaligned pairs at the
feature level. Then, the CMM establishes the spatial correlation field for
feature rearrangement. Eventually, a translation module transforms the
rearranged features to realistic results. Meanwhile, for improving the fidelity
of cross-scale pose transformation, we propose a graph based Body Structure
Retaining Loss (BSR Loss) to preserve reasonable body structures on half body
to full body generation. Extensive experiments conducted on DeepFashion dataset
demonstrate the superiority of our method compared with other supervised and
unsupervised approaches. Furthermore, satisfactory results on face generation
show the versatility of our method in other deformation tasks.
Related papers
- FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition [9.059664504170287]
Federated learning enables decentralized clients to collaboratively learn a shared model while keeping all the training data local.
We introduce a novel approach, FissionVAE, which decomposes the latent space and constructs decoder branches tailored to individual client groups.
To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images of Earth.
arXiv Detail & Related papers (2024-08-30T08:22:30Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Self-supervised Semantic Segmentation: Consistency over Transformation [3.485615723221064]
We propose a novel self-supervised algorithm, textbfS$3$-Net, which integrates a robust framework based on the proposed Inception Large Kernel Attention (I-LKA) modules.
We leverage deformable convolution as an integral component to effectively capture and delineate lesion deformations for superior object boundary definition.
Our experimental results on skin lesion and lung organ segmentation tasks show the superior performance of our method compared to the SOTA approaches.
arXiv Detail & Related papers (2023-08-31T21:28:46Z) - Reconstruction-driven Dynamic Refinement based Unsupervised Domain
Adaptation for Joint Optic Disc and Cup Segmentation [25.750583118977833]
Glaucoma is one of the leading causes of irreversible blindness.
It remains challenging to train an OD/OC segmentation model that could be deployed successfully to different healthcare centers.
We propose a novel unsupervised domain adaptation (UDA) method called Reconstruction-driven Dynamic Refinement Network (RDR-Net)
arXiv Detail & Related papers (2023-04-10T13:33:13Z) - Learning Detail-Structure Alternative Optimization for Blind
Super-Resolution [69.11604249813304]
We propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR.
In our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures.
Our method achieves the state-of-the-art against existing methods.
arXiv Detail & Related papers (2022-12-03T14:44:17Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.