When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability
- URL: http://arxiv.org/abs/2403.00467v3
- Date: Tue, 15 Oct 2024 01:42:21 GMT
- Title: When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability
- Authors: Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao,
- Abstract summary: ControlNet excels at creating content that closely matches precise contours in user-provided masks.
When these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts.
This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis.
An advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised.
- Score: 93.15085958220024
- License:
- Abstract: ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subsequently, to enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised. The deterioration estimator assesses the deterioration factor of the provided masks. Then this factor is utilized in the modulation block to adaptively modulate the model's contour-following ability, which helps it dismiss the noise part in the inexplicit masks. Extensive experiments prove its effectiveness in encouraging ControlNet to interpret inaccurate spatial conditions robustly rather than blindly following the given contours, suitable for diverse kinds of conditions. We showcase application scenarios like modifying shape priors and composable shape-controllable generation. Codes are available at github.
Related papers
- ControlFace: Harnessing Facial Parametric Control for Face Rigging [31.765503860508378]
We introduce ControlFace, a novel face rigging method conditioned on 3DMM renderings that enables flexible, high-fidelity control.
We employ a dual-branch U-Nets: one, referred to as FaceNet, captures identity and fine details, while the other focuses on generation.
By training on a facial video dataset, we fully utilize FaceNet's rich representations while ensuring control adherence.
arXiv Detail & Related papers (2024-12-02T06:00:27Z) - Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion [27.61734719689046]
We propose a training-free approach named Mask-guided Prompt Following (MGPF) to enhance prompt following with visual control.
The efficacy and superiority of MGPF are validated through comprehensive quantitative and qualitative experiments.
arXiv Detail & Related papers (2024-04-23T06:10:43Z) - ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [20.910939141948123]
ControlNet++ is a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.
It achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.
arXiv Detail & Related papers (2024-04-11T17:59:09Z) - Fine-grained Controllable Video Generation via Object Appearance and
Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control.
FACTOR aims to control objects' appearances and context, including their location and category.
Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking.
We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model.
We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z) - Calibrated Hyperspectral Image Reconstruction via Graph-based
Self-Tuning Network [40.71031760929464]
Hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded snapshot spectral imaging (CASSI) system.
Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI.
This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments.
We propose a novel Graph-based Self-Tuning ( GST) network to reason uncertainties adapting to varying spatial structures of masks among
arXiv Detail & Related papers (2021-12-31T09:39:13Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z) - MagGAN: High-Resolution Face Attribute Editing with Mask-Guided
Generative Adversarial Network [145.4591079418917]
MagGAN learns to only edit the facial parts that are relevant to the desired attribute changes.
A novel mask-guided conditioning strategy is introduced to incorporate the influence region of each attribute change into the generator.
A multi-level patch-wise discriminator structure is proposed to scale our model for high-resolution ($1024 times 1024$) face editing.
arXiv Detail & Related papers (2020-10-03T20:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.