Related papers: GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation

GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation

URL: http://arxiv.org/abs/2601.01181v1
Date: Sat, 03 Jan 2026 13:13:51 GMT
Title: GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation
Authors: Chenglizhao Chen, Shaojiang Yuan, Xiaoxue Lu, Mengke Song, Jia Song, Zhenyu Wu, Wenfeng Song, Shuai Li,
Abstract summary: GenCAMO is an environment-aware and mask-free generative framework that produces high-fidelity camouflage image-dense annotations.<n>We present GenCAMO, an environment-aware and mask-free generative framework that produces high-fidelity camouflage image-dense annotations.
Score: 32.630064141052166
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conceal dense prediction (CDP), especially RGB-D camouflage object detection and open-vocabulary camouflage object segmentation, plays a crucial role in advancing the understanding and reasoning of complex camouflage scenes. However, high-quality and large-scale camouflage datasets with dense annotation remain scarce due to expensive data collection and labeling costs. To address this challenge, we explore leveraging generative models to synthesize realistic camouflage image-dense data for training CDP models with fine-grained representations, prior knowledge, and auxiliary reasoning. Concretely, our contributions are threefold: (i) we introduce GenCAMO-DB, a large-scale camouflage dataset with multi-modal annotations, including depth maps, scene graphs, attribute descriptions, and text prompts; (ii) we present GenCAMO, an environment-aware and mask-free generative framework that produces high-fidelity camouflage image-dense annotations; (iii) extensive experiments across multiple modalities demonstrate that GenCAMO significantly improves dense prediction performance on complex camouflage scenes by providing high-quality synthetic data. The code and datasets will be released after paper acceptance.

Related papers

RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance [13.352489108641938]
We propose a unified out-painting based framework for realistic camouflaged image generation.<n>ReamCamo explicitly introduces additional layout controls to regulate global image structure.<n>We also introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images.
arXiv Detail & Related papers (2025-12-28T15:37:56Z)
Text-guided Controllable Diffusion for Realistic Camouflage Images Generation [33.31050008276478]
Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings.<n>We propose Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images.
arXiv Detail & Related papers (2025-11-25T11:43:58Z)
Unified Dense Prediction of Video Diffusion [91.16237431830417]
We present a unified network for simultaneously generating videos and their corresponding entity segmentation and depth maps from text prompts.<n>We utilize colormap to represent entity masks and depth maps, tightly integrating dense prediction with RGB video generation.
arXiv Detail & Related papers (2025-03-12T12:41:02Z)
Towards Natural Image Matting in the Wild via Real-Scenario Prior [69.96414467916863]
We propose a new matting dataset based on the COCO dataset, namely COCO-Matting. The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes.
arXiv Detail & Related papers (2024-10-09T06:43:19Z)
Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy [27.251750465641305]
We present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns. We have compiled the first dataset comprising descriptions of camouflaged objects and their attribute contributions. We have developed a robust framework that combines textual and visual information for the task of Camouflaged Object Attribution (COS) ACUMEN demonstrates superior performance, outperforming nine leading methods across three widely-used datasets.
arXiv Detail & Related papers (2024-08-22T02:51:21Z)
Diffusion Models are Efficient Data Generators for Human Mesh Recovery [55.37787289869703]
We show that synthetic data created by generative models is complementary to CG-rendered data.<n>We propose an effective data generation pipeline based on recent diffusion models, termed HumanWild.<n>Our work could pave the way for scaling up 3D human recovery to in-the-wild scenes.
arXiv Detail & Related papers (2024-03-17T06:31:16Z)
Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection [47.653092957888596]
We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes.<n>Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models.<n>Our framework outperforms the current state-of-the-art method on three datasets.
arXiv Detail & Related papers (2023-08-13T06:55:05Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes. We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.