Generative Image Inpainting with Segmentation Confusion Adversarial
Training and Contrastive Learning
- URL: http://arxiv.org/abs/2303.13133v1
- Date: Thu, 23 Mar 2023 09:34:17 GMT
- Title: Generative Image Inpainting with Segmentation Confusion Adversarial
Training and Contrastive Learning
- Authors: Zhiwen Zuo, Lei Zhao, Ailin Li, Zhizhong Wang, Zhanjie Zhang, Jiafu
Chen, Wei Xing, Dongming Lu
- Abstract summary: We present a new adversarial training framework for image inpainting with segmentation confusion adversarial training (SCAT) and contrastive learning.
SCAT plays an adversarial game between an inpainting generator and a segmentation network, which provides pixel-level local training signals.
We conduct extensive experiments on two benchmark datasets, demonstrating our model's effectiveness and superiority both qualitatively and quantitatively.
- Score: 14.358417509144523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a new adversarial training framework for image inpainting
with segmentation confusion adversarial training (SCAT) and contrastive
learning. SCAT plays an adversarial game between an inpainting generator and a
segmentation network, which provides pixel-level local training signals and can
adapt to images with free-form holes. By combining SCAT with standard global
adversarial training, the new adversarial training framework exhibits the
following three advantages simultaneously: (1) the global consistency of the
repaired image, (2) the local fine texture details of the repaired image, and
(3) the flexibility of handling images with free-form holes. Moreover, we
propose the textural and semantic contrastive learning losses to stabilize and
improve our inpainting model's training by exploiting the feature
representation space of the discriminator, in which the inpainting images are
pulled closer to the ground truth images but pushed farther from the corrupted
images. The proposed contrastive losses better guide the repaired images to
move from the corrupted image data points to the real image data points in the
feature representation space, resulting in more realistic completed images. We
conduct extensive experiments on two benchmark datasets, demonstrating our
model's effectiveness and superiority both qualitatively and quantitatively.
Related papers
- CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data [40.88256210436378]
We present a novel weakly supervised pre-training of vision models on web-scale image-text data.
The proposed method reframes pre-training on image-text data as a classification task.
It achieves a remarkable $2.7times$ acceleration in training speed compared to contrastive learning on web-scale data.
arXiv Detail & Related papers (2024-04-24T05:13:28Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking [34.31345844296072]
Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text.
Most current composed image retrieval methods follow a supervised learning approach to training on a costly triplet dataset composed of a reference image, modified text, and a corresponding target image.
We present a new training-free zero-shot composed image retrieval method which translates the query into explicit human-understandable text.
arXiv Detail & Related papers (2023-12-14T13:31:01Z) - Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models.
We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - ExCon: Explanation-driven Supervised Contrastive Learning for Image
Classification [12.109442912963969]
We propose to leverage saliency-based explanation methods to create content-preserving masked augmentations for contrastive learning.
Our novel explanation-driven supervised contrastive learning (ExCon) methodology critically serves the dual goals of encouraging nearby image embeddings to have similar content and explanation.
We demonstrate that ExCon outperforms vanilla supervised contrastive learning in terms of classification, explanation quality, adversarial robustness as well as calibration of probabilistic predictions of the model in the context of distributional shift.
arXiv Detail & Related papers (2021-11-28T23:15:26Z) - Cooperative Training and Latent Space Data Augmentation for Robust
Medical Image Segmentation [13.017279828963444]
Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment.
We present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples.
arXiv Detail & Related papers (2021-07-02T13:39:13Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.