SemAug: Semantically Meaningful Image Augmentations for Object Detection
Through Language Grounding
- URL: http://arxiv.org/abs/2208.07407v1
- Date: Mon, 15 Aug 2022 19:00:56 GMT
- Title: SemAug: Semantically Meaningful Image Augmentations for Object Detection
Through Language Grounding
- Authors: Morgan Heisler and Amin Banitalebi-Dehkordi and Yong Zhang
- Abstract summary: We propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes.
Our method of semantically meaningful image augmentation for object detection via language grounding, SemAug, starts by calculating semantically appropriate new objects.
- Score: 5.715548995729382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is an essential technique in improving the generalization
of deep neural networks. The majority of existing image-domain augmentations
either rely on geometric and structural transformations, or apply different
kinds of photometric distortions. In this paper, we propose an effective
technique for image augmentation by injecting contextually meaningful knowledge
into the scenes. Our method of semantically meaningful image augmentation for
object detection via language grounding, SemAug, starts by calculating
semantically appropriate new objects that can be placed into relevant locations
in the image (the what and where problems). Then it embeds these objects into
their relevant target locations, thereby promoting diversity of object instance
distribution. Our method allows for introducing new object instances and
categories that may not even exist in the training set. Furthermore, it does
not require the additional overhead of training a context network, so it can be
easily added to existing architectures. Our comprehensive set of evaluations
showed that the proposed method is very effective in improving the
generalization, while the overhead is negligible. In particular, for a wide
range of model architectures, our method achieved ~2-4% and ~1-2% mAP
improvements for the task of object detection on the Pascal VOC and COCO
datasets, respectively.
Related papers
- ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query.
Existing methods demonstrate a significant performance drop when there are multiple distractions in an image.
We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z) - Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Context Decoupling Augmentation for Weakly Supervised Semantic
Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years.
We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear.
To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - ObjectAug: Object-level Data Augmentation for Semantic Image
Segmentation [22.91204798022379]
semantic image segmentation aims to obtain object labels with precise boundaries.
Current strategies operate at the image level, and objects and the background are coupled.
We propose ObjectAug to perform object-level augmentation for semantic image segmentation.
arXiv Detail & Related papers (2021-01-30T12:46:20Z) - Combining Semantic Guidance and Deep Reinforcement Learning For
Generating Human Level Paintings [22.889059874754242]
Generation of stroke-based non-photorealistic imagery is an important problem in the computer vision community.
Previous methods have been limited to datasets with little variation in position, scale and saliency of the foreground object.
We propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time.
arXiv Detail & Related papers (2020-11-25T09:00:04Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.