Text-guided Controllable Diffusion for Realistic Camouflage Images Generation
- URL: http://arxiv.org/abs/2511.20218v1
- Date: Tue, 25 Nov 2025 11:43:58 GMT
- Title: Text-guided Controllable Diffusion for Realistic Camouflage Images Generation
- Authors: Yuhang Qian, Haiyan Chen, Wentong Li, Ningzhong Liu, Jie Qin,
- Abstract summary: Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings.<n>We propose Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images.
- Score: 33.31050008276478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings. Existing methods perform CIG by either fusing objects into specific backgrounds or outpainting the surroundings via foreground object-guided diffusion. However, they often fail to obtain natural results because they overlook the logical relationship between camouflaged objects and background environments. To address this issue, we propose CT-CIG, a Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images. Leveraging Large Visual Language Models (VLM), we design a Camouflage-Revealing Dialogue Mechanism (CRDM) to annotate existing camouflage datasets with high-quality text prompts. Subsequently, the constructed image-prompt pairs are utilized to finetune Stable Diffusion, incorporating a lightweight controller to guide the location and shape of camouflaged objects for enhanced camouflage scene fitness. Moreover, we design a Frequency Interaction Refinement Module (FIRM) to capture high-frequency texture features, facilitating the learning of complex camouflage patterns. Extensive experiments, including CLIPScore evaluation and camouflage effectiveness assessment, demonstrate the semantic alignment of our generated text prompts and CT-CIG's ability to produce photorealistic camouflage images.
Related papers
- GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation [32.630064141052166]
GenCAMO is an environment-aware and mask-free generative framework that produces high-fidelity camouflage image-dense annotations.<n>We present GenCAMO, an environment-aware and mask-free generative framework that produces high-fidelity camouflage image-dense annotations.
arXiv Detail & Related papers (2026-01-03T13:13:51Z) - RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance [13.352489108641938]
We propose a unified out-painting based framework for realistic camouflaged image generation.<n>ReamCamo explicitly introduces additional layout controls to regulate global image structure.<n>We also introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images.
arXiv Detail & Related papers (2025-12-28T15:37:56Z) - CGCOD: Class-Guided Camouflaged Object Detection [19.959268087062217]
We introduce class-guided camouflaged object detection (CGCOD), which extends traditional COD task by incorporating object-specific class knowledge.<n>We propose a multi-stage framework, CGNet, which incorporates a plug-and-play class prompt generator and a simple yet effective class-guided detector.<n>This establishes a new paradigm for COD, bridging the gap between contextual understanding and class-guided detection.
arXiv Detail & Related papers (2024-12-25T19:38:32Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection [47.653092957888596]
We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes.<n>Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models.<n>Our framework outperforms the current state-of-the-art method on three datasets.
arXiv Detail & Related papers (2023-08-13T06:55:05Z) - CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes.
We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes.
Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Towards Deeper Understanding of Camouflaged Object Detection [64.81987999832032]
We argue that the binary segmentation setting fails to fully understand the concept of camouflage.
We present the first triple-task learning framework to simultaneously localize, segment and rank camouflaged objects.
arXiv Detail & Related papers (2022-05-23T14:26:18Z) - Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via
Geometry-Aware Adversarial Learning [9.150245363036165]
Dynamic objects have a significant impact on the robot's perception of the environment.
In this work, we address this problem by synthesizing plausible color, texture and geometry in regions occluded by dynamic objects.
We optimize our architecture using adversarial training to synthesize fine realistic textures which enables it to hallucinate color and depth structure in occluded regions online.
arXiv Detail & Related papers (2020-08-12T01:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.