ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints
- URL: http://arxiv.org/abs/2503.14720v1
- Date: Tue, 18 Mar 2025 20:48:58 GMT
- Title: ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints
- Authors: Vihaan Misra, Peter Schaldenbrand, Jean Oh,
- Abstract summary: ShapeShift is a text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations.<n>We introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur.<n>Our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt.
- Score: 13.2441524021269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While diffusion-based models excel at generating photorealistic images from text, a more nuanced challenge emerges when constrained to using only a fixed set of rigid shapes, akin to solving tangram puzzles or arranging real-world objects to match semantic descriptions. We formalize this problem as shape-based image generation, a new text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations and visually communicating the target concept. Unlike pixel-manipulation approaches, our method, ShapeShift, explicitly parameterizes each shape within a differentiable vector graphics pipeline, iteratively optimizing placement and orientation through score distillation sampling from pretrained diffusion models. To preserve arrangement clarity, we introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur, ensuring smooth convergence toward physically valid configurations. By bridging diffusion-based semantic guidance with explicit geometric constraints, our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt. Extensive experiments demonstrate compelling results across diverse scenarios, with quantitative and qualitative advantages over alternative techniques.
Related papers
- Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models [4.415961468927045]
DanceText is a training-free framework for multilingual text editing in images.
It supports complex geometric transformations and achieves seamless foreground-background integration.
arXiv Detail & Related papers (2025-04-18T23:46:32Z) - Training-free Composite Scene Generation for Layout-to-Image Synthesis [29.186425845897947]
This paper introduces a novel training-free approach designed to overcome adversarial semantic intersections during the diffusion conditioning phase.
We propose two innovative constraints: 1) an inter-token constraint that resolves token conflicts to ensure accurate concept synthesis; and 2) a self-attention constraint that improves pixel-to-pixel relationships.
Our evaluations confirm the effectiveness of leveraging layout information for guiding the diffusion process, generating content-rich images with enhanced fidelity and complexity.
arXiv Detail & Related papers (2024-07-18T15:48:07Z) - Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Enhancing Object Coherence in Layout-to-Image Synthesis [13.289854750239956]
We propose a novel diffusion model with effective global semantic fusion (GSF) and self-similarity feature enhancement modules.
For semantic coherence, we argue that the image caption contains rich information for defining the semantic relationship within the objects in the images.
To improve the physical coherence, we develop a Self-similarity Coherence Attention synthesis (SCA) module to explicitly integrate local contextual physical coherence relation into each pixel's generation process.
arXiv Detail & Related papers (2023-11-17T13:43:43Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - Parallax-Tolerant Unsupervised Deep Image Stitching [57.76737888499145]
We propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique.
First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion.
To further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks.
arXiv Detail & Related papers (2023-02-16T10:40:55Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Learning to Caricature via Semantic Shape Transform [95.25116681761142]
We propose an algorithm based on a semantic shape transform to produce shape exaggerations.
We show that the proposed framework is able to render visually pleasing shape exaggerations while maintaining their facial structures.
arXiv Detail & Related papers (2020-08-12T03:41:49Z) - Image Morphing with Perceptual Constraints and STN Alignment [70.38273150435928]
We propose a conditional GAN morphing framework operating on a pair of input images.
A special training protocol produces sequences of frames, combined with a perceptual similarity loss, promote smooth transformation over time.
We provide comparisons to classic as well as latent space morphing techniques, and demonstrate that, given a set of images for self-supervision, our network learns to generate visually pleasing morphing effects.
arXiv Detail & Related papers (2020-04-29T10:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.