Enrich the content of the image Using Context-Aware Copy Paste
- URL: http://arxiv.org/abs/2407.08151v1
- Date: Thu, 11 Jul 2024 03:07:28 GMT
- Title: Enrich the content of the image Using Context-Aware Copy Paste
- Authors: Qiushi Guo,
- Abstract summary: We propose a context-aware approach that integrates Bi Latent Information Propagation (BLIP) for content extraction from source images.
By matching extracted content information with category information, our method ensures cohesive integration of target objects using Segment Anything Model (SAM) and You Only Look Once (YOLO)
Experimental evaluations across diverse datasets demonstrate the effectiveness of our method in enhancing data diversity and generating high-quality pseudo-images.
- Score: 1.450405446885067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation remains a widely utilized technique in deep learning, particularly in tasks such as image classification, semantic segmentation, and object detection. Among them, Copy-Paste is a simple yet effective method and gain great attention recently. However, existing Copy-Paste often overlook contextual relevance between source and target images, resulting in inconsistencies in generated outputs. To address this challenge, we propose a context-aware approach that integrates Bidirectional Latent Information Propagation (BLIP) for content extraction from source images. By matching extracted content information with category information, our method ensures cohesive integration of target objects using Segment Anything Model (SAM) and You Only Look Once (YOLO). This approach eliminates the need for manual annotation, offering an automated and user-friendly solution. Experimental evaluations across diverse datasets demonstrate the effectiveness of our method in enhancing data diversity and generating high-quality pseudo-images across various computer vision tasks.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Enhancing Historical Image Retrieval with Compositional Cues [3.2276097734075426]
We introduce a crucial factor from computational aesthetics, namely image composition, into this topic.
By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information.
arXiv Detail & Related papers (2024-03-21T10:51:19Z) - ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation [36.43428388918294]
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning.
Standard data filtering approaches fail to remove mismatched text-image pairs.
We propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness.
arXiv Detail & Related papers (2024-03-02T20:36:10Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - A deep learning approach to clustering visual arts [7.363576598794859]
We propose DELIUS: a DEep learning approach to cLustering vIsUal artS.
The method uses a pre-trained convolutional network to extract features and then feeds these features into a deep embedded clustering model.
The task of mapping the raw input data to a latent space is optimized jointly with the task of finding a set of cluster centroids in this latent space.
arXiv Detail & Related papers (2021-06-11T08:35:26Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Webpage Segmentation for Extracting Images and Their Surrounding
Contextual Information [0.0]
We propose a webpage segmentation algorithm targeting the extraction of web images and their contextual information based on their characteristics as they appear on webpages.
We conducted a user study to obtain a human-labeled dataset to validate the effectiveness of our method and experiments demonstrated that our method can achieve better results than an existing segmentation algorithm.
arXiv Detail & Related papers (2020-05-18T19:00:03Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.