Related papers: RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance

RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance

URL: http://arxiv.org/abs/2512.22974v1
Date: Sun, 28 Dec 2025 15:37:56 GMT
Title: RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance
Authors: Chunyuan Chen, Yunuo Cai, Shujuan Li, Weiyun Liang, Bin Wang, Jing Xu,
Abstract summary: We propose a unified out-painting based framework for realistic camouflaged image generation.<n>ReamCamo explicitly introduces additional layout controls to regulate global image structure.<n>We also introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images.
Score: 13.352489108641938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camouflaged image generation (CIG) has recently emerged as an efficient alternative for acquiring high-quality training data for camouflaged object detection (COD). However, existing CIG methods still suffer from a substantial gap to real camouflaged imagery: generated images either lack sufficient camouflage due to weak visual similarity, or exhibit cluttered backgrounds that are semantically inconsistent with foreground targets. To address these limitations, we propose ReamCamo, a unified out-painting based framework for realistic camouflaged image generation. ReamCamo explicitly introduces additional layout controls to regulate global image structure, thereby improving semantic coherence between foreground objects and generated backgrounds. Moreover, we construct a multi-modal textual-visual condition by combining a unified fine-grained textual task description with texture-oriented background retrieval, which jointly guides the generation process to enhance visual fidelity and realism. To quantitatively assess camouflage quality, we further introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images. Extensive experiments and visualizations demonstrate the effectiveness of our proposed framework.

Related papers

Text-guided Controllable Diffusion for Realistic Camouflage Images Generation [33.31050008276478]
Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings.<n>We propose Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images.
arXiv Detail & Related papers (2025-11-25T11:43:58Z)
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning [76.2503352325492]
ControlThinker is a novel framework that employs a "comprehend-then-generate" paradigm.<n>Latent semantics from control images are mined to enrich text prompts.<n>This enriched semantic understanding then seamlessly aids in image generation without the need for additional complex modifications.
arXiv Detail & Related papers (2025-06-04T05:56:19Z)
Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss. Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z)
Enhancing Object Coherence in Layout-to-Image Synthesis [2.3805413240744304]
We propose a novel diffusion model with effective global semantic fusion (GSF) and self-similarity feature enhancement modules to guide the object coherence.<n>For semantic coherence, we argue that the image caption contains rich information for defining the semantic relationship within the objects in the images.<n>To improve the physical coherence, we develop a Self-similarity Coherence Attention synthesis (SCA) module to explicitly integrate local contextual physical coherence relation into each pixel's generation process.
arXiv Detail & Related papers (2023-11-17T13:43:43Z)
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out. Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z)
Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection [47.653092957888596]
We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes.<n>Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models.<n>Our framework outperforms the current state-of-the-art method on three datasets.
arXiv Detail & Related papers (2023-08-13T06:55:05Z)
CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes. We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z)
Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z)
Enhanced Residual Networks for Context-based Image Outpainting [0.0]
Deep models struggle to understand context and extrapolation through retained information. Current models use generative adversarial networks to generate results which lack localized image feature consistency and appear fake. We propose two methods to improve this issue: the use of a local and global discriminator, and the addition of residual blocks within the encoding section of the network.
arXiv Detail & Related papers (2020-05-14T05:14:26Z)
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained. We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.