Related papers: ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

URL: http://arxiv.org/abs/2412.08645v1
Date: Wed, 11 Dec 2024 18:59:53 GMT
Title: ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Authors: Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen,
Abstract summary: This paper introduces a tuning-free method for both object insertion and subject-driven generation.<n>The task involves composing an object, given multiple views, into a scene specified by either an image or text.<n>We compare our method with state-of-the-art methods for object insertion and subject-driven generation, using a single or multiple references.
Score: 33.91045409317844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a tuning-free method for both object insertion and subject-driven generation. The task involves composing an object, given multiple views, into a scene specified by either an image or text. Existing methods struggle to fully meet the task's challenging objectives: (i) seamlessly composing the object into the scene with photorealistic pose and lighting, and (ii) preserving the object's identity. We hypothesize that achieving these goals requires large scale supervision, but manually collecting sufficient data is simply too expensive. The key observation in this paper is that many mass-produced objects recur across multiple images of large unlabeled datasets, in different scenes, poses, and lighting conditions. We use this observation to create massive supervision by retrieving sets of diverse views of the same object. This powerful paired dataset enables us to train a straightforward text-to-image diffusion architecture to map the object and scene descriptions to the composited image. We compare our method, ObjectMate, with state-of-the-art methods for object insertion and subject-driven generation, using a single or multiple references. Empirically, ObjectMate achieves superior identity preservation and more photorealistic composition. Differently from many other multi-reference methods, ObjectMate does not require slow test-time tuning.

Related papers

Multiple Object Stitching for Unsupervised Representation Learning [11.087735229999817]
We propose a method, Multiple Object Stitching, to refine the unsupervised representation for multi-object images.<n>Our method provides additional object correspondences between multi-object images without human annotations.<n> Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance.
arXiv Detail & Related papers (2025-06-09T02:28:21Z)
ObjectClear: Complete Object Removal via Object-Effect Attention [56.2893552300215]
We introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts.<n>We propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks.<n>Experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.
arXiv Detail & Related papers (2025-05-28T17:51:17Z)
ObjectMover: Generative Object Movement with Video Prior [69.75281888309017]
We present ObjectMover, a generative model that can perform object movement in challenging scenes. We show that with this approach, our model is able to adjust to complex real-world scenarios. We propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization.
arXiv Detail & Related papers (2025-03-11T04:42:59Z)
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization [5.2337753974570616]
We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene. Key challenge is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image. We introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase.
arXiv Detail & Related papers (2025-03-10T08:27:02Z)
ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos [105.40690994956667]
Ego-Exo Object Correspondence task aims to map objects across ego-centric and exo-centric views. We introduce ObjectRelator, a novel method designed to tackle this task.
arXiv Detail & Related papers (2024-11-28T12:01:03Z)
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image. Our method generates an articulated object that is visually consistent with the input image. Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z)
Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur. We present the first large-scale datasets for blurred object retrieval. Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level. Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z)
Image Segmentation-based Unsupervised Multiple Objects Discovery [1.7674345486888503]
Unsupervised object discovery aims to localize objects in images. We propose a fully unsupervised, bottom-up approach, for multiple objects discovery. We provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
arXiv Detail & Related papers (2022-12-20T09:48:24Z)
LayoutBERT: Masked Language Layout Model for Object Insertion [3.4806267677524896]
We propose layoutBERT for the object insertion task. It uses a novel self-supervised masked language model objective and bidirectional multi-head self-attention. We provide both qualitative and quantitative evaluations on datasets from diverse domains.
arXiv Detail & Related papers (2022-04-30T21:35:38Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images. We present a simple yet surprisingly effective framework to do so. Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.