CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
- URL: http://arxiv.org/abs/2602.14464v1
- Date: Mon, 16 Feb 2026 04:52:29 GMT
- Title: CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
- Authors: Wenbo Nie, Zixiang Li, Renshuai Tao, Bin Wu, Yunchao Wei, Yao Zhao,
- Abstract summary: CoCoDiff is a training-free and low-cost style transfer framework for computer vision.<n>It exploits pretrained latent diffusion models to achieve fine-grained, semantically consistent stylization.<n>CoCoDiff delivers state-of-the-art visual quality and strong quantitative results, outperforming methods that rely on extra training or annotations.
- Score: 85.217605146499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferring visual style between images while preserving semantic correspondence between similar objects remains a central challenge in computer vision. While existing methods have made great strides, most of them operate at global level but overlook region-wise and even pixel-wise semantic correspondence. To address this, we propose CoCoDiff, a novel training-free and low-cost style transfer framework that leverages pretrained latent diffusion models to achieve fine-grained, semantically consistent stylization. We identify that correspondence cues within generative diffusion models are under-explored and that content consistency across semantically matched regions is often neglected. CoCoDiff introduces a pixel-wise semantic correspondence module that mines intermediate diffusion features to construct a dense alignment map between content and style images. Furthermore, a cycle-consistency module then enforces structural and perceptual alignment across iterations, yielding object and region level stylization that preserves geometry and detail. Despite requiring no additional training or supervision, CoCoDiff delivers state-of-the-art visual quality and strong quantitative results, outperforming methods that rely on extra training or annotations.
Related papers
- Neural Scene Designer: Self-Styled Semantic Image Manipulation [67.43125248646653]
We introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions.<n>NSD ensures both semantic alignment with user intent and stylistic consistency with the surrounding environment.<n>To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module.
arXiv Detail & Related papers (2025-09-01T11:59:03Z) - CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning [30.111296778234124]
CorrMoE is a correspondence pruning framework that enhances robustness under cross-domain and cross-scene variations.<n>For scene diversity, we design a Bi-Fusion Mixture of Experts module that adaptively integrates multi-perspective features.<n>Experiments on benchmark datasets demonstrate that CorrMoE achieves superior accuracy and generalization compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T01:44:01Z) - ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints [13.2441524021269]
ShapeShift is a text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations.<n>We introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur.<n>Our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt.
arXiv Detail & Related papers (2025-03-18T20:48:58Z) - Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains.
Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains.
We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z) - Bending Graphs: Hierarchical Shape Matching using Gated Optimal
Transport [80.64516377977183]
Shape matching has been a long-studied problem for the computer graphics and vision community.
We investigate a hierarchical learning design, to which we incorporate local patch-level information and global shape-level structures.
We propose a novel optimal transport solver by recurrently updating features on non-confident nodes to learn globally consistent correspondences between the shapes.
arXiv Detail & Related papers (2022-02-03T11:41:46Z) - Consistent Style Transfer [23.193302706359464]
Recently, attentional arbitrary style transfer methods have been proposed to achieve fine-grained results.
We propose the progressive attentional manifold alignment (PAMA) to alleviate this problem.
We show that PAMA achieves state-of-the-art performance while avoiding the inconsistency of semantic regions.
arXiv Detail & Related papers (2022-01-06T20:19:35Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Bi-level Feature Alignment for Versatile Image Translation and
Manipulation [88.5915443957795]
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
arXiv Detail & Related papers (2021-07-07T05:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.