Multiple Object Stitching for Unsupervised Representation Learning
- URL: http://arxiv.org/abs/2506.07364v1
- Date: Mon, 09 Jun 2025 02:28:21 GMT
- Title: Multiple Object Stitching for Unsupervised Representation Learning
- Authors: Chengchao Shen, Dawei Liu, Jianxin Wang,
- Abstract summary: We propose a method, Multiple Object Stitching, to refine the unsupervised representation for multi-object images.<n>Our method provides additional object correspondences between multi-object images without human annotations.<n> Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance.
- Score: 11.087735229999817
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Contrastive learning for single object centric images has achieved remarkable progress on unsupervised representation, but suffering inferior performance on the widespread images with multiple objects. In this paper, we propose a simple but effective method, Multiple Object Stitching (MOS), to refine the unsupervised representation for multi-object images. Specifically, we construct the multi-object images by stitching the single object centric ones, where the objects in the synthesized multi-object images are predetermined. Hence, compared to the existing contrastive methods, our method provides additional object correspondences between multi-object images without human annotations. In this manner, our method pays more attention to the representations of each object in multi-object image, thus providing more detailed representations for complicated downstream tasks, such as object detection and semantic segmentation. Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance on both single object centric images and multi-object ones. The source code is available at https://github.com/visresearch/MultipleObjectStitching.
Related papers
- unMORE: Unsupervised Multi-Object Segmentation via Center-Boundary Reasoning [6.259786457043613]
Unsupervised multi-object segmentation is a challenging problem on single images.<n>In this paper, we introduce unMORE, a novel two-stage pipeline designed to identify many complex objects in real-world images.<n>Our method excels in crowded images where all baselines collapse.
arXiv Detail & Related papers (2025-06-02T15:22:51Z) - Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization [5.2337753974570616]
We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene.<n>Key challenge is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image.<n>We introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase.
arXiv Detail & Related papers (2025-03-10T08:27:02Z) - ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation [33.91045409317844]
This paper introduces a tuning-free method for both object insertion and subject-driven generation.<n>The task involves composing an object, given multiple views, into a scene specified by either an image or text.<n>We compare our method with state-of-the-art methods for object insertion and subject-driven generation, using a single or multiple references.
arXiv Detail & Related papers (2024-12-11T18:59:53Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding [77.26626173589746]
We present the Multi-view Approach to Grounding in Context (MAGiC)
It selects an object referent based on language that distinguishes between two similar objects.
It improves over the state-of-the-art model on the SNARE object reference task with a relative error reduction of 12.9%.
arXiv Detail & Related papers (2023-11-12T00:21:58Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Image Segmentation-based Unsupervised Multiple Objects Discovery [1.7674345486888503]
Unsupervised object discovery aims to localize objects in images.
We propose a fully unsupervised, bottom-up approach, for multiple objects discovery.
We provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
arXiv Detail & Related papers (2022-12-20T09:48:24Z) - Multi-modal Transformers Excel at Class-agnostic Object Detection [105.10403103027306]
We argue that existing methods lack a top-down supervision signal governed by human-understandable semantics.
We develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention.
We show the significance of MViT proposals in a diverse range of applications.
arXiv Detail & Related papers (2021-11-22T18:59:29Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.