Alfie: Democratising RGBA Image Generation With No $$$
- URL: http://arxiv.org/abs/2408.14826v1
- Date: Tue, 27 Aug 2024 07:13:44 GMT
- Title: Alfie: Democratising RGBA Image Generation With No $$$
- Authors: Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara,
- Abstract summary: We propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model.
We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes.
- Score: 33.334956022229846
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers' productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model, exploiting the prompt-guided controllability and visual quality offered by such models with no additional computational cost. We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes. We show with a user study that, in most cases, users prefer our solution over generating and then matting an image, and we show that our generated illustrations yield good results when used as inputs for composite scene generation pipelines. We release the code at https://github.com/aimagelab/Alfie.
Related papers
- Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation [44.457347230146404]
We leverage the scene graph, a powerful structured representation, for complex image generation.
We employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner.
Our method outperforms recent competitors based on text, layout, or scene graph.
arXiv Detail & Related papers (2024-10-01T07:02:46Z) - AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People [4.41462357579624]
People with visual impairments often struggle to create content that relies heavily on visual elements.
Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork.
Our work integrates generative AI with a constructive approach that provides users with enhanced control and editing capabilities.
arXiv Detail & Related papers (2024-08-05T01:47:36Z) - BlenderAlchemy: Editing 3D Graphics with Vision-Language Models [4.852796482609347]
A vision-based edit generator and state evaluator work together to find the correct sequence of actions to achieve the goal.
Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of Vision-Language Models with "imagined" reference images.
arXiv Detail & Related papers (2024-04-26T19:37:13Z) - Re-Thinking Inverse Graphics With Large Language Models [51.333105116400205]
Inverse graphics -- inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics.
We propose the Inverse-Graphics Large Language Model (IG-LLM), an inversegraphics framework centered around an LLM.
We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training.
arXiv Detail & Related papers (2024-04-23T16:59:02Z) - MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - CLIP-CLOP: CLIP-Guided Collage and Photomontage [16.460669517251084]
We design a gradient-based generator to produce collages.
It requires the human artist to curate libraries of image patches and to describe (with prompts) the whole image composition.
We explore the aesthetic potentials of high-resolution collages, and provide an open-source Google Colab as an artistic tool.
arXiv Detail & Related papers (2022-05-06T11:33:49Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - State of the Art on Neural Rendering [141.22760314536438]
We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs.
This report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence.
arXiv Detail & Related papers (2020-04-08T04:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.