Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
- URL: http://arxiv.org/abs/2601.01224v1
- Date: Sat, 03 Jan 2026 16:10:18 GMT
- Title: Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
- Authors: Bac Nguyen, Yuhta Takida, Naoki Murata, Chieh-Hsin Lai, Toshimitsu Uesaka, Stefano Ermon, Yuki Mitsufuji,
- Abstract summary: Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content.<n>We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot-image correspondence.
- Score: 83.56510119503265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot-image correspondence. The resulting training objective serves as a tractable surrogate for maximizing mutual information (MI) between slots and inputs, strengthening slot representation quality. On both synthetic (MOVi-C/E) and real-world datasets (VOC, COCO), CODA improves object discovery (e.g., +6.1% FG-ARI on COCO), property prediction, and compositional image generation over strong baselines. Register slots add negligible overhead, keeping CODA efficient and scalable. These results indicate potential applications of CODA as an effective framework for robust OCL in complex, real-world scenes.
Related papers
- QASA: Quality-Guided K-Adaptive Slot Attention for Unsupervised Object-Centric Learning [80.82392186401354]
Slot Attention is an approach that binds different objects in a scene to a set of "slots"<n>Previous K-adaptive methods do not explicitly constrain slot-binding quality.<n>We propose Quality-Guided K-Adaptive Slot Attention (QASA)
arXiv Detail & Related papers (2026-01-19T10:42:07Z) - CORE-ReID V2: Advancing the Domain Adaptation for Object Re-Identification with Optimized Training and Ensemble Fusion [0.0]
This study presents CORE-ReID V2, an enhanced framework building upon CORE-ReID.<n>The new framework addresses Unsupervised Domain Adaptation (UDA) challenges in Person ReID and Vehicle ReID, with further applicability to Object ReID.<n> Experimental results on widely used UDA Person ReID and Vehicle ReID datasets demonstrate that the proposed framework outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T02:57:09Z) - Slot Attention with Re-Initialization and Self-Distillation [33.38373596185185]
We propose Slot Attention with re-Initialization and self-Distillation (DIAS) for object discovery and recognition.<n>DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning.
arXiv Detail & Related papers (2025-07-31T17:41:18Z) - Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.<n>The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - HierarchicalContrast: A Coarse-to-Fine Contrastive Learning Framework
for Cross-Domain Zero-Shot Slot Filling [4.1940152307593515]
Cross-domain zero-shot slot filling plays a vital role in leveraging source domain knowledge to learn a model.
Existing state-of-the-art zero-shot slot filling methods have limited generalization ability in target domain.
We present a novel Hierarchical Contrastive Learning Framework (HiCL) for zero-shot slot filling.
arXiv Detail & Related papers (2023-10-13T14:23:33Z) - Contrastive Learning with Consistent Representations [8.364383223740097]
This paper proposes Contrastive Learning with Consistent Representations CoCor.
At the heart of CoCor is a novel consistency metric termed DA consistency.
Experimental results demonstrate that CoCor notably enhances the generalizability and transferability of learned representations.
arXiv Detail & Related papers (2023-02-03T04:34:00Z) - Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent
Space Distribution Matching in WAE [51.09507030387935]
Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.
We propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem.
We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE.
arXiv Detail & Related papers (2021-10-19T22:55:47Z) - FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding [14.896822373116729]
We present Few-Shot object detection via Contrastive proposals (FSCE)
FSCE is a simple yet effective approach to learning contrastive-aware object encodings that facilitate the classification of detected objects.
Our design outperforms current state-of-the-art works in any shot and all data, with up to +8.8% on standard benchmark PASCAL VOC and +2.7% on challenging benchmark.
arXiv Detail & Related papers (2021-03-10T09:15:05Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.