Related papers: Shadow Generation for Composite Image Using Diffusion model

Shadow Generation for Composite Image Using Diffusion model

URL: http://arxiv.org/abs/2403.15234v1
Date: Fri, 22 Mar 2024 14:27:58 GMT
Title: Shadow Generation for Composite Image Using Diffusion model
Authors: Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu,
Abstract summary: We resort to foundation model with rich prior knowledge of natural shadow images. We first adapt ControlNet to our task and then propose intensity modulation modules to improve the shadow intensity. Experimental results on both DESOBA and DESOBAv2 datasets as well as real composite images demonstrate the superior capability of our model for shadow generation task.
Score: 16.316311264197324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of image composition, generating realistic shadow for the inserted foreground remains a formidable challenge. Previous works have developed image-to-image translation models which are trained on paired training data. However, they are struggling to generate shadows with accurate shapes and intensities, hindered by data scarcity and inherent task complexity. In this paper, we resort to foundation model with rich prior knowledge of natural shadow images. Specifically, we first adapt ControlNet to our task and then propose intensity modulation modules to improve the shadow intensity. Moreover, we extend the small-scale DESOBA dataset to DESOBAv2 using a novel data acquisition pipeline. Experimental results on both DESOBA and DESOBAv2 datasets as well as real composite images demonstrate the superior capability of our model for shadow generation task. The dataset, code, and model are released at https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2.

Related papers

BD-Diff: Generative Diffusion Model for Image Deblurring on Unknown Domains with Blur-Decoupled Learning [55.21345354747609]
BD-Diff is a generative-diffusion-based model designed to enhance deblurring performance on unknown domains. We employ two Q-Formers as structural representations and blur patterns extractors separately. We introduce a reconstruction task to make the structural features and blur patterns complementary.
arXiv Detail & Related papers (2025-02-03T17:00:40Z)
Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition. It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects. Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z)
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation [58.09421301921607]
We construct the first large-scale dataset for subject-driven image editing and generation. Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
arXiv Detail & Related papers (2024-06-13T16:40:39Z)
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data [87.61900472933523]
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. We scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos.
arXiv Detail & Related papers (2024-01-19T18:59:52Z)
Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z)
Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal [8.555176637147648]
We develop Deshadow-Anything, considering the generalization of large-scale datasets, to achieve image shadow removal. The diffusion model can diffuse along the edges and textures of an image, helping to remove shadows while preserving the details of the image. Experiments on shadow removal tasks demonstrate that these methods can effectively improve image restoration performance.
arXiv Detail & Related papers (2023-09-21T01:35:13Z)
DESOBAv2: Towards Large-scale Real-world Dataset for Shadow Generation [19.376935979734714]
In this work, we focus on generating plausible shadow for the inserted foreground object to make the composite image more realistic. To supplement the existing small-scale dataset DESOBA, we create a large-scale dataset called DESOBAv2.
arXiv Detail & Related papers (2023-08-19T10:21:23Z)
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations. Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z)
Shadow Generation with Decomposed Mask Prediction and Attentive Shadow Filling [26.780859992812186]
We focus on generating plausible shadows for the inserted foreground object to make the composite image more realistic. To supplement the existing small-scale dataset, we create a large-scale dataset called RdSOBA with rendering techniques. We design a two-stage network named DMASNet with mask prediction and attentive shadow filling.
arXiv Detail & Related papers (2023-06-30T01:32:16Z)
ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms. We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance. Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Learning from Synthetic Shadows for Shadow Detection and Removal [43.53464469097872]
Recent shadow removal approaches all train convolutional neural networks (CNN) on real paired shadow/shadow-free or shadow/shadow-free/mask image datasets. We present SynShadow, a novel large-scale synthetic shadow/shadow-free/matte image triplets dataset and a pipeline to synthesize it.
arXiv Detail & Related papers (2021-01-05T18:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.