Related papers: Shape2Animal: Creative Animal Generation from Natural Silhouettes

Shape2Animal: Creative Animal Generation from Natural Silhouettes

URL: http://arxiv.org/abs/2506.20616v2
Date: Fri, 27 Jun 2025 01:15:28 GMT
Title: Shape2Animal: Creative Animal Generation from Natural Silhouettes
Authors: Quoc-Duy Tran, Anh-Tuan Vo, Dinh-Khoi Vo, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le,
Abstract summary: This paper introduces Shape2Animal framework to reinterpret natural object silhouettes, such as clouds, stones, or flames, as plausible animal forms.<n>Our automated framework first performs open-vocabulary segmentation to extract object silhouette and interprets semantically appropriate animal concepts.<n>It then synthesizes an animal image that conforms to the input shape, leveraging text-to-image diffusion model and seamlessly blends it into the original scene.
Score: 14.338537127280402
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Humans possess a unique ability to perceive meaningful patterns in ambiguous stimuli, a cognitive phenomenon known as pareidolia. This paper introduces Shape2Animal framework to mimics this imaginative capacity by reinterpreting natural object silhouettes, such as clouds, stones, or flames, as plausible animal forms. Our automated framework first performs open-vocabulary segmentation to extract object silhouette and interprets semantically appropriate animal concepts using vision-language models. It then synthesizes an animal image that conforms to the input shape, leveraging text-to-image diffusion model and seamlessly blends it into the original scene to generate visually coherent and spatially consistent compositions. We evaluated Shape2Animal on a diverse set of real-world inputs, demonstrating its robustness and creative potential. Our Shape2Animal can offer new opportunities for visual storytelling, educational content, digital art, and interactive media design. Our project page is here: https://shape2image.github.io

Related papers

Reconstructing Animals and the Wild [51.98009864071166]
We propose a method to reconstruct natural scenes from single images.<n>We base our approach on advances leveraging the strong world priors in Large Language Models.<n>We propose a synthetic dataset comprising one million images and thousands of assets.
arXiv Detail & Related papers (2024-11-27T23:24:27Z)
An Individual Identity-Driven Framework for Animal Re-Identification [15.381573249551181]
IndivAID is a framework specifically designed for Animal ReID. It generates image-specific and individual-specific textual descriptions that fully capture the diverse visual concepts of each individual across animal images. Evaluation against state-of-the-art methods across eight benchmark datasets and a real-world Stoat dataset demonstrates IndivAID's effectiveness and applicability.
arXiv Detail & Related papers (2024-10-30T11:34:55Z)
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation [9.573188010530217]
ImgAny is a novel end-to-end multi-modal generative model that can mimic human reasoning and generate high-quality images. Our method serves as the first attempt in its capacity of efficiently and flexibly taking any combination of seven modalities.
arXiv Detail & Related papers (2024-01-31T08:35:40Z)
DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination [140.1641573781066]
We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts. We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts. The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
arXiv Detail & Related papers (2023-11-27T01:24:31Z)
Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image [30.997936022365018]
We propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning. Results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization.
arXiv Detail & Related papers (2023-11-22T07:06:38Z)
Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images. We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images. This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z)
The Hidden Language of Diffusion Models [70.03691458189604]
We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. We find surprising visual connections between concepts, that transcend their textual semantics. We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
arXiv Detail & Related papers (2023-06-01T17:57:08Z)
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality. We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z)
MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z)
Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene. We design a compositional neural network that predicts the silhouette, garment labels, and textures. We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.