Text2Relight: Creative Portrait Relighting with Text Guidance
- URL: http://arxiv.org/abs/2412.13734v1
- Date: Wed, 18 Dec 2024 11:12:10 GMT
- Title: Text2Relight: Creative Portrait Relighting with Text Guidance
- Authors: Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek,
- Abstract summary: We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting.<n>Our model modifies the lighting and color of both the foreground and background to align with the provided text description.
- Score: 26.75526739002697
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided text description. The unbounded nature in creativeness of a text allows us to describe the lighting of a scene with any sensory features including temperature, emotion, smell, time, and so on. However, the modeling of such mapping between the unbounded text and lighting is extremely challenging due to the lack of dataset where there exists no scalable data that provides large pairs of text and relighting, and therefore, current text-driven image editing models does not generalize to lighting-specific use cases. We overcome this problem by introducing a novel data synthesis pipeline: First, diverse and creative text prompts that describe the scenes with various lighting are automatically generated under a crafted hierarchy using a large language model (*e.g.,* ChatGPT). A text-guided image generation model creates a lighting image that best matches the text. As a condition of the lighting images, we perform image-based relighting for both foreground and background using a single portrait image or a set of OLAT (One-Light-at-A-Time) images captured from lightstage system. Particularly for the background relighting, we represent the lighting image as a set of point lights and transfer them to other background images. A generative diffusion model learns the synthesized large-scale data with auxiliary task augmentation (*e.g.,* portrait delighting and light positioning) to correlate the latent text and lighting distribution for text-guided portrait relighting.
Related papers
- DreamLight: Towards Harmonious and Consistent Image Relighting [41.90032795389507]
We introduce a model named DreamLight for universal image relighting.<n>It can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone.
arXiv Detail & Related papers (2025-06-17T14:05:24Z) - ScribbleLight: Single Image Indoor Relighting with Scribbles [3.6902409965263474]
We introduce ScribbleLight, a generative model that supports local fine-grained control of lighting effects through scribbles.
Our key technical novelty is an Albedo-conditioned Stable Image Diffusion model that preserves the intrinsic color and texture of the original image after relighting.
We demonstrate ScribbleLight's ability to create different lighting effects (e.g., turning lights on/off, adding highlights, cast shadows, or indirect lighting from unseen lights) from sparse scribble annotations.
arXiv Detail & Related papers (2024-11-26T18:59:11Z) - First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts.
Specifically, a background generator is developed to produce high-fidelity and text-free natural images.
We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z) - LightIt: Illumination Modeling and Control for Diffusion Models [61.80461416451116]
We introduce LightIt, a method for explicit illumination control for image generation.
Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation.
Our method is the first that enables the generation of images with controllable, consistent lighting.
arXiv Detail & Related papers (2024-03-15T18:26:33Z) - DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation [16.080481761005203]
We present a novel method for exerting fine-grained lighting control during text-driven image generation.
Our key observation is that we only need to guide the diffusion process, hence exact radiance hints are not necessary.
We demonstrate and validate our lighting controlled diffusion model on a variety of text prompts and lighting conditions.
arXiv Detail & Related papers (2024-02-19T08:17:21Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - Physically-Based Editing of Indoor Scene Lighting from a Single Image [106.60252793395104]
We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks.
We tackle this problem using two novel components: 1) a holistic scene reconstruction method that estimates scene reflectance and parametric 3D lighting, and 2) a neural rendering framework that re-renders the scene from our predictions.
arXiv Detail & Related papers (2022-05-19T06:44:37Z) - SILT: Self-supervised Lighting Transfer Using Implicit Image
Decomposition [27.72518108918135]
The solution operates as a two-branch network that first aims to map input images of any arbitrary lighting style to a unified domain.
We then remap this unified input domain using a discriminator that is presented with the generated outputs and the style reference.
Our method is shown to outperform supervised relighting solutions across two different datasets without requiring lighting supervision.
arXiv Detail & Related papers (2021-10-25T12:52:53Z) - Scene relighting with illumination estimation in the latent space on an
encoder-decoder scheme [68.8204255655161]
In this report we present methods that we tried to achieve that goal.
Our models are trained on a rendered dataset of artificial locations with varied scene content, light source location and color temperature.
With this dataset, we used a network with illumination estimation component aiming to infer and replace light conditions in the latent space representation of the concerned scenes.
arXiv Detail & Related papers (2020-06-03T15:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.