Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration
- URL: http://arxiv.org/abs/2602.08615v2
- Date: Thu, 12 Feb 2026 14:10:05 GMT
- Title: Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration
- Authors: Kfir Goldberg, Elad Richardson, Yael Vinker,
- Abstract summary: We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation.<n>We use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs.
- Score: 13.00602873238112
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While generative models have become powerful tools for image synthesis, they are typically optimized for executing carefully crafted textual prompts, offering limited support for the open-ended visual exploration that often precedes idea formation. In contrast, designers frequently draw inspiration from loosely connected visual references, seeking emergent connections that spark new ideas. We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation. Given two input images, our model produces diverse, visually coherent compositions that reveal latent relationships between inputs, without relying on user-specified text prompts. Our approach is feed-forward, trained on synthetic triplets of decomposed visual aspects derived entirely through visual means: we use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs. By removing the reliance on language and enabling fast, intuitive recombination, our method supports visual ideation at the early and ambiguous stages of creative work.
Related papers
- Chatting with Images for Introspective Visual Thinking [50.7747647794877]
''Chatting with images'' is a new framework that reframes visual manipulation as language-guided feature modulation.<n>Under the guidance of expressive language prompts, the model dynamically performs joint re-encoding over multiple image regions.<n>ViLaVT achieves strong and consistent improvements on complex multi-image and video-based spatial reasoning tasks.
arXiv Detail & Related papers (2026-02-11T17:42:37Z) - VLM-Guided Adaptive Negative Prompting for Creative Generation [21.534474554320823]
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance.<n>We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation.<n>We show consistent gains in creative novelty with negligible computational overhead.
arXiv Detail & Related papers (2025-10-12T17:34:59Z) - ThematicPlane: Bridging Tacit User Intent and Latent Spaces for Image Generation [49.805992099208595]
We introduce ThematicPlane, a system that enables users to navigate and manipulate high-level semantic concepts.<n>This interface bridges the gap between tacit creative intent and system control.
arXiv Detail & Related papers (2025-08-08T06:57:14Z) - RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z) - Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning [58.73625654718187]
Generalized zero-shot learning aims to recognize both seen and unseen classes with the help of semantic information that is shared among different classes.<n>Existing approaches fine-tune the visual backbone by seen-class data to obtain semantic-related visual features.<n>This paper proposes a novel visual and semantic prompt collaboration framework, which utilizes prompt tuning techniques for efficient feature adaptation.
arXiv Detail & Related papers (2025-03-29T10:17:57Z) - Piece it Together: Part-Based Concepting with IP-Priors [52.01640707131325]
We introduce a generative framework that seamlessly integrates a partial set of user-provided visual components into a coherent composition.<n>Our approach builds on a strong and underexplored representation space, extracted from IP-Adapter+.<n>We also present a LoRA-based fine-tuning strategy that significantly improves prompt adherence in IP-Adapter+ for a given task.
arXiv Detail & Related papers (2025-03-13T13:46:10Z) - IP-Composer: Semantic Composition of Visual Concepts [49.18472621931207]
We present IP-Composer, a training-free approach for compositional image generation.<n>Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input image's CLIP embedding.<n>We extend this approach to multiple visual inputs by crafting composite embeddings, stitched from the projections of multiple input images onto concept-specific CLIP-subspaces identified through text.
arXiv Detail & Related papers (2025-02-19T18:49:31Z) - Concept Decomposition for Visual Exploration and Inspiration [53.06983340652571]
We propose a method to decompose a visual concept into different visual aspects encoded in a hierarchical tree structure.
We utilize large vision-language models and their rich latent space for concept decomposition and generation.
arXiv Detail & Related papers (2023-05-29T16:56:56Z) - Visually-Situated Natural Language Understanding with Contrastive
Reading Model and Frozen Large Language Models [24.456117679941816]
Contrastive Reading Model (Cream) is a novel neural architecture designed to enhance the language-image understanding capability of Large Language Models (LLMs)
Our approach bridges the gap between vision and language understanding, paving the way for the development of more sophisticated Document Intelligence Assistants.
arXiv Detail & Related papers (2023-05-24T11:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.