MureObjectStitch: Multi-reference Image Composition
- URL: http://arxiv.org/abs/2411.07462v3
- Date: Fri, 04 Apr 2025 02:49:47 GMT
- Title: MureObjectStitch: Multi-reference Image Composition
- Authors: Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu,
- Abstract summary: Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image.<n>The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time.<n>We propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object.
- Score: 23.110826295932554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time. In this work, we propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object. Moreover, we propose a multi-reference strategy, which allows the model to take in multiple reference images of the foreground object. The experiments on MureCOM dataset verify the effectiveness of our method. The code and model have been released at https://github.com/bcmi/MureObjectStitch-Image-Composition.
Related papers
- Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation [10.416673784744281]
We propose a weighted-merge method to merge multiple reference image features into corresponding objects.
Our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and DreamBooth dataset of multi-object personalized image generation.
arXiv Detail & Related papers (2024-09-26T15:04:13Z) - DreamCom: Finetuning Text-guided Inpainting Model for Image Composition [24.411003826961686]
We propose DreamCom by treating image composition as text-guided image inpainting customized for certain object.
Specifically, we finetune pretrained text-guided image inpainting model based on a few reference images containing the same object.
In practice, the inserted object may be adversely affected by the background, so we propose masked attention mechanisms to avoid negative background interference.
arXiv Detail & Related papers (2023-09-27T09:23:50Z) - ControlCom: Controllable Image Composition using Diffusion Model [45.48263800282992]
We propose a controllable image composition method that unifies four tasks in one diffusion model.
We also propose a local enhancement module to enhance the foreground details in the diffusion model.
The proposed method is evaluated on both public benchmark and real-world data.
arXiv Detail & Related papers (2023-08-19T14:56:44Z) - Unsupervised Compositional Concepts Discovery with Text-to-Image
Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image?
We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - LayoutBERT: Masked Language Layout Model for Object Insertion [3.4806267677524896]
We propose layoutBERT for the object insertion task.
It uses a novel self-supervised masked language model objective and bidirectional multi-head self-attention.
We provide both qualitative and quantitative evaluations on datasets from diverse domains.
arXiv Detail & Related papers (2022-04-30T21:35:38Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Templates for 3D Object Pose Estimation Revisited: Generalization to New
Objects and Robustness to Occlusions [79.34847067293649]
We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions.
It relies on a small set of training objects to learn local object representations.
We are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets.
arXiv Detail & Related papers (2022-03-31T17:50:35Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Unsupervised Layered Image Decomposition into Object Prototypes [39.20333694585477]
We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models.
We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks.
We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images.
arXiv Detail & Related papers (2021-04-29T18:02:01Z) - Adversarial Image Composition with Auxiliary Illumination [53.89445873577062]
We propose an Adversarial Image Composition Net (AIC-Net) that achieves realistic image composition.
A novel branched generation mechanism is proposed, which disentangles the generation of shadows and the transfer of foreground styles.
Experiments on pedestrian and car composition tasks show that the proposed AIC-Net achieves superior composition performance.
arXiv Detail & Related papers (2020-09-17T12:58:16Z) - BachGAN: High-Resolution Image Synthesis from Salient Object Layout [78.51640906030244]
We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.
Two main challenges spring from this new task: (i) how to generate fine-grained details and realistic textures without segmentation map input; and (ii) how to create a background and weave it seamlessly into standalone objects.
By generating the hallucinated background representation dynamically, our model can synthesize high-resolution images with both photo-realistic foreground and integral background.
arXiv Detail & Related papers (2020-03-26T00:54:44Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.