BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
- URL: http://arxiv.org/abs/2503.21991v1
- Date: Thu, 27 Mar 2025 21:21:20 GMT
- Title: BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
- Authors: Hang Zhou, Xinxin Zuo, Rui Ma, Li Cheng,
- Abstract summary: We introduce BOOTPLACE, a novel paradigm that formulates object placement as a placement-by-detection problem.<n> Experimental results on established benchmarks demonstrate BOOTPLACE's superior performance in object repositioning.
- Score: 23.300369070771836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the copy-paste image-to-image composition problem with a focus on object placement learning. Prior methods have leveraged generative models to reduce the reliance for dense supervision. However, this often limits their capacity to model complex data distributions. Alternatively, transformer networks with a sparse contrastive loss have been explored, but their over-relaxed regularization often leads to imprecise object placement. We introduce BOOTPLACE, a novel paradigm that formulates object placement as a placement-by-detection problem. Our approach begins by identifying suitable regions of interest for object placement. This is achieved by training a specialized detection transformer on object-subtracted backgrounds, enhanced with multi-object supervisions. It then semantically associates each target compositing object with detected regions based on their complementary characteristics. Through a boostrapped training approach applied to randomly object-subtracted images, our model enforces meaningful placements through extensive paired data augmentation. Experimental results on established benchmarks demonstrate BOOTPLACE's superior performance in object repositioning, markedly surpassing state-of-the-art baselines on Cityscapes and OPA datasets with notable improvements in IOU scores. Additional ablation studies further showcase the compositionality and generalizability of our approach, supported by user study evaluations.
Related papers
- SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling [41.24071764578782]
Object detection in satellite-borne Synthetic Aperture Radar imagery holds immense potential in tasks such as urban monitoring and disaster response.
The detection of small objects in satellite-borne SAR images poses a particularly intricate problem, because of the technology's relatively low spatial resolution and inherent noise.
In this paper, we introduce TRANSAR, a novel self-supervised end-to-end vision transformer-based SAR object detection model.
arXiv Detail & Related papers (2025-04-17T19:44:05Z) - Learning Object Focused Attention [5.340670496809963]
We propose an adaptation to the training of Vision Transformers (ViTs) that allows for an explicit modeling of objects during the attention computation.
This is achieved by adding a new branch to selected attention layers that computes an auxiliary loss which we call the object-focused attention (OFA) loss.
Our experimental results demonstrate that ViTs with OFA achieve better classification results than their base models, exhibit a stronger generalization ability, and learn representations based on object shapes rather than spurious correlations via general textures.
arXiv Detail & Related papers (2025-04-10T23:23:26Z) - Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining [25.174869954072648]
This paper proposes an Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining (IMNet)
To obtain complete object-level targets, we customize prototypes for both the source and tampered regions and dynamically update them.
We operate experiments on three public datasets which validate the effectiveness and the robustness of the proposed IMNet.
arXiv Detail & Related papers (2024-03-31T09:01:17Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Contrastive View Design Strategies to Enhance Robustness to Domain
Shifts in Downstream Object Detection [37.06088084592779]
We conduct an empirical study of contrastive learning and out-of-domain object detection.
We propose strategies to augment views and enhance robustness in appearance-shifted and context-shifted scenarios.
Our results and insights show how to ensure robustness through the choice of views in contrastive learning.
arXiv Detail & Related papers (2022-12-09T00:34:50Z) - Spatial Reasoning for Few-Shot Object Detection [21.3564383157159]
We propose a spatial reasoning framework that detects novel objects with only a few training examples in a context.
We employ a graph convolutional network as the RoIs and their relatedness are defined as nodes and edges, respectively.
We demonstrate that the proposed method significantly outperforms the state-of-the-art methods and verify its efficacy through extensive ablation studies.
arXiv Detail & Related papers (2022-11-02T12:38:08Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.