Loomis Painter: Reconstructing the Painting Process
- URL: http://arxiv.org/abs/2511.17344v2
- Date: Thu, 27 Nov 2025 10:23:27 GMT
- Title: Loomis Painter: Reconstructing the Painting Process
- Authors: Markus Pobitzer, Chang Liu, Chenyi Zhuang, Teng Long, Bin Ren, Nicu Sebe,
- Abstract summary: Step-by-step painting tutorials are vital for learning artistic techniques, but existing video resources lack interactivity and personalization.<n>We propose a unified framework for multi-media painting process generation with a semantics-driven style control mechanism.<n>We also build a large-scale dataset of real painting processes and evaluate cross-media consistency, temporal coherence, and final-image fidelity.
- Score: 56.713812157283805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Step-by-step painting tutorials are vital for learning artistic techniques, but existing video resources (e.g., YouTube) lack interactivity and personalization. While recent generative models have advanced artistic image synthesis, they struggle to generalize across media and often show temporal or structural inconsistencies, hindering faithful reproduction of human creative workflows. To address this, we propose a unified framework for multi-media painting process generation with a semantics-driven style control mechanism that embeds multiple media into a diffusion models conditional space and uses cross-medium style augmentation. This enables consistent texture evolution and process transfer across styles. A reverse-painting training strategy further ensures smooth, human-aligned generation. We also build a large-scale dataset of real painting processes and evaluate cross-media consistency, temporal coherence, and final-image fidelity, achieving strong results on LPIPS, DINO, and CLIP metrics. Finally, our Perceptual Distance Profile (PDP) curve quantitatively models the creative sequence, i.e., composition, color blocking, and detail refinement, mirroring human artistic progression.
Related papers
- VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation [73.23035143627598]
Most generative models treat sketches as static images, overlooking the temporal structure that underlies creative drawing.<n>We present a data-efficient approach for sequential sketch generation that adapts pretrained text-to-video diffusion models.<n>Our method generates high-quality sketches that closely follow text-specified orderings while exhibiting rich visual detail.
arXiv Detail & Related papers (2026-02-17T18:55:03Z) - PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and Generation [47.72342715926692]
Oil painting is a high-level medium that blends human abstract thinking with artistic expression.<n>Existing generation and editing techniques are often constrained by the distribution of training data.<n>We introduce a unified multimodal framework for oil painting generation and editing.
arXiv Detail & Related papers (2025-12-09T12:31:00Z) - Birth of a Painting: Differentiable Brushstroke Reconstruction [25.61763988336406]
Painting embodies a unique form of visual storytelling, where the creation process is as significant as the final artwork.<n>Our approach produces realistic and stylized appearances, offering a unified model for digital painting.
arXiv Detail & Related papers (2025-11-17T09:55:53Z) - Unified Autoregressive Visual Generation and Understanding with Continuous Tokens [52.21981295470491]
We present UniFluid, a unified autoregressive framework for joint visual generation and understanding.<n>Our unified autoregressive architecture processes multimodal image and text inputs, generating discrete tokens for text and continuous tokens for image.<n>We find though there is an inherent trade-off between the image generation and understanding task, a carefully tuned training recipe enables them to improve each other.
arXiv Detail & Related papers (2025-03-17T17:58:30Z) - Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching [16.98431990178662]
We provide a precise definition of complex scenes and introduce a set of Complex Decomposition Criteria (CDC) based on this definition.
Inspired by the artists painting process, we propose a training-free diffusion framework called Complex Diffusion (CxD), which divides the process into three stages: composition, painting, and retouching.
arXiv Detail & Related papers (2024-08-25T15:05:32Z) - Artistic Intelligence: A Diffusion-Based Framework for High-Fidelity Landscape Painting Synthesis [2.205829309604458]
LPGen is a novel diffusion-based model specifically designed for landscape painting generation.
LPGen introduces a decoupled cross-attention mechanism that independently processes structural and stylistic features.
The model is pre-trained on a curated dataset of high-resolution landscape images, categorized by distinct artistic styles, and then fine-tuned to ensure detailed and consistent output.
arXiv Detail & Related papers (2024-07-24T12:32:24Z) - ProcessPainter: Learn Painting Process from Sequence Data [27.9875429986135]
The painting process of artists is inherently stepwise and varies significantly among different painters and styles.
Traditional stroke-based rendering methods break down images into sequences of brushstrokes, yet they fall short of replicating the authentic processes of artists.
We introduce ProcessPainter, a text-to-video model that is initially pre-trained on synthetic data and subsequently fine-tuned with a select set of artists' painting sequences.
arXiv Detail & Related papers (2024-06-10T07:18:41Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion [73.08710648258985]
Key painting attributes including layout, perspective, shape, and semantics often cannot be conveyed and expressed through style transfer.<n>Large-scale pretrained text-to-image generation models have demonstrated their capability to synthesize a vast amount of high-quality images.<n>Our main novel idea is to integrate multimodal semantic information as a synthesis guide into artworks, rather than transferring style to the real world.
arXiv Detail & Related papers (2024-01-25T10:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.