Cycle-Consistent Tuning for Layered Image Decomposition
- URL: http://arxiv.org/abs/2602.20989v2
- Date: Sat, 28 Feb 2026 06:10:31 GMT
- Title: Cycle-Consistent Tuning for Layered Image Decomposition
- Authors: Zheng Gu, Min Lu, Zhida Sun, Dani Lischinski, Daniel Cohen-O, Hui Huang,
- Abstract summary: Disentangling visual layers in real-world images is a persistent challenge in vision and graphics.<n>We present an in-context image decomposition framework that leverages large diffusion foundation models for layered separation.<n>Our approach achieves accurate and coherent decompositions and also generalizes effectively across other decomposition types.
- Score: 26.331480224165364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disentangling visual layers in real-world images is a persistent challenge in vision and graphics, as such layers often involve non-linear and globally coupled interactions, including shading, reflection, and perspective distortion. In this work, we present an in-context image decomposition framework that leverages large diffusion foundation models for layered separation. We focus on the challenging case of logo-object decomposition, where the goal is to disentangle a logo from the surface on which it appears while faithfully preserving both layers. Our method fine-tunes a pretrained diffusion model via lightweight LoRA adaptation and introduces a cycle-consistent tuning strategy that jointly trains decomposition and composition models, enforcing reconstruction consistency between decomposed and recomposed images. This bidirectional supervision substantially enhances robustness in cases where the layers exhibit complex interactions. Furthermore, we introduce a progressive self-improving process, which iteratively augments the training set with high-quality model-generated examples to refine performance. Extensive experiments demonstrate that our approach achieves accurate and coherent decompositions and also generalizes effectively across other decomposition types, suggesting its potential as a unified framework for layered image decomposition.
Related papers
- Combined Flicker-banding and Moire Removal for Screen-Captured Images [24.036188551666573]
We present the first systematic study on joint removal of moiré patterns and flicker-banding in screen-captured images.<n>To support this task, we construct a large-scale dataset containing both moiré patterns and flicker-banding.<n>We also introduce an ISP-based flicker simulation pipeline to stabilize model training and expand the degradation distribution.
arXiv Detail & Related papers (2026-02-02T02:53:41Z) - From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition [16.7393689710179]
layered representation enables independent editing of elements, offering greater flexibility for content creation.<n>We observe a strong connection between layer decomposition and in/outpainting tasks, and propose adapting a diffusion-based inpainting model for layer decomposition using lightweight finetuning.<n>To further preserve detail in the latent space, we introduce a novel multi-modal context fusion module with linear attention complexity.
arXiv Detail & Related papers (2025-11-26T02:50:07Z) - Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers [55.15722080205737]
Edit2Perceive is a unified diffusion framework that adapts editing models for depth, normal, and matting.<n>Our single-step deterministic inference yields up to faster runtime while training on relatively small datasets.
arXiv Detail & Related papers (2025-11-24T01:13:51Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - OneRestore: A Universal Restoration Framework for Composite Degradation [33.556183375565034]
In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow.
Our study proposes a versatile imaging model that consolidates four physical corruption paradigms to accurately represent complex, composite degradation scenarios.
OneRestore is a novel transformer-based framework designed for adaptive, controllable scene restoration.
arXiv Detail & Related papers (2024-07-05T16:27:00Z) - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - Neural Spline Fields for Burst Image Fusion and Layer Separation [40.9442467471977]
We propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields.
Our method is able to jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers.
We find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches.
arXiv Detail & Related papers (2023-12-21T18:54:19Z) - Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Bridging Component Learning with Degradation Modelling for Blind Image
Super-Resolution [69.11604249813304]
We propose a components decomposition and co-optimization network (CDCN) for blind SR.
CDCN decomposes the input LR image into structure and detail components in feature space.
We present a degradation-driven learning strategy to jointly supervise the HR image detail and structure restoration process.
arXiv Detail & Related papers (2022-12-03T14:53:56Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Learning to See Through Obstructions with Layered Decomposition [117.77024641706451]
We present a learning-based approach for removing unwanted obstructions from moving images.
Our method leverages motion differences between the background and obstructing elements to recover both layers.
We show that the proposed approach learned from synthetically generated data performs well to real images.
arXiv Detail & Related papers (2020-08-11T17:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.