Impact of Label Types on Training SWIN Models with Overhead Imagery
- URL: http://arxiv.org/abs/2310.07572v1
- Date: Wed, 11 Oct 2023 15:14:54 GMT
- Title: Impact of Label Types on Training SWIN Models with Overhead Imagery
- Authors: Ryan Ford, Kenneth Hutchison, Nicholas Felts, Benjamin Cheng, Jesse
Lew, Kyle Jackson
- Abstract summary: This work examined the impact of training shifted window transformers using bounding boxes and segmentation labels.
We found that the models trained on only target pixels do not show performance improvement for classification tasks.
For object detection, we found that models trained with either label type showed equivalent performance across testing.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the impact of data set design on model training and performance
can help alleviate the costs associated with generating remote sensing and
overhead labeled data. This work examined the impact of training shifted window
transformers using bounding boxes and segmentation labels, where the latter are
more expensive to produce. We examined classification tasks by comparing models
trained with both target and backgrounds against models trained with only
target pixels, extracted by segmentation labels. For object detection models,
we compared performance using either label type when training. We found that
the models trained on only target pixels do not show performance improvement
for classification tasks, appearing to conflate background pixels in the
evaluation set with target pixels. For object detection, we found that models
trained with either label type showed equivalent performance across testing. We
found that bounding boxes appeared to be sufficient for tasks that did not
require more complex labels, such as object segmentation. Continuing work to
determine consistency of this result across data types and model architectures
could potentially result in substantial savings in generating remote sensing
data sets for deep learning.
Related papers
- Stanceformer: Target-Aware Transformer for Stance Detection [59.69858080492586]
Stance Detection involves discerning the stance expressed in a text towards a specific subject or target.
Prior works have relied on existing transformer models that lack the capability to prioritize targets effectively.
We introduce Stanceformer, a target-aware transformer model that incorporates enhanced attention towards the targets during both training and inference.
arXiv Detail & Related papers (2024-10-09T17:24:28Z) - Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images.
Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms.
We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z) - Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training [84.95281245784348]
Overemphasizing co-occurrence relationships can cause the overfitting issue of the model.
We provide a causal inference framework to show that the correlative features caused by the target object and its co-occurring objects can be regarded as a mediator.
arXiv Detail & Related papers (2024-04-09T13:13:24Z) - Task Specific Pretraining with Noisy Labels for Remote Sensing Image Segmentation [18.598405597933752]
Self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations.
In this work, we propose to exploit noisy semantic segmentation maps for model pretraining.
The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels.
arXiv Detail & Related papers (2024-02-25T18:01:42Z) - Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer)
We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments.
We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Texture-Based Input Feature Selection for Action Recognition [3.9596068699962323]
We propose a novel method to determine the task-irrelevant content in inputs which increases the domain discrepancy.
We show that our proposed model is superior to existing models for action recognition on the HMDB-51 dataset and the Penn Action dataset.
arXiv Detail & Related papers (2023-02-28T23:56:31Z) - Free Lunch for Co-Saliency Detection: Context Adjustment [14.688461235328306]
We propose a "cost-free" group-cut-paste (GCP) procedure to leverage images from off-the-shelf saliency detection datasets and synthesize new samples.
We collect a novel dataset called Context Adjustment Training. The two variants of our dataset, i.e., CAT and CAT+, consist of 16,750 and 33,500 images, respectively.
arXiv Detail & Related papers (2021-08-04T14:51:37Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.