CP2M: Clustered-Patch-Mixed Mosaic Augmentation for Aerial Image Segmentation
- URL: http://arxiv.org/abs/2501.15389v1
- Date: Sun, 26 Jan 2025 04:03:08 GMT
- Title: CP2M: Clustered-Patch-Mixed Mosaic Augmentation for Aerial Image Segmentation
- Authors: Yijie Li, Hewei Wang, Jinfeng Xu, Zixiao Ma, Puzhen Wu, Shaofan Wang, Soumyabrata Dev,
- Abstract summary: This paper proposes a novel augmentation strategy, Clustered-Patch-Mixed Mosaic (CP2M)
CP2M integrates a Mosaic augmentation phase with a clustered patch mix phase.
Experiments on the ISPRS Potsdam dataset demonstrate that CP2M substantially mitigates overfitting.
- Score: 9.625982455419306
- License:
- Abstract: Remote sensing image segmentation is pivotal for earth observation, underpinning applications such as environmental monitoring and urban planning. Due to the limited annotation data available in remote sensing images, numerous studies have focused on data augmentation as a means to alleviate overfitting in deep learning networks. However, some existing data augmentation strategies rely on simple transformations that may not sufficiently enhance data diversity or model generalization capabilities. This paper proposes a novel augmentation strategy, Clustered-Patch-Mixed Mosaic (CP2M), designed to address these limitations. CP2M integrates a Mosaic augmentation phase with a clustered patch mix phase. The former stage constructs a new sample from four random samples, while the latter phase uses the connected component labeling algorithm to ensure the augmented data maintains spatial coherence and avoids introducing irrelevant semantics when pasting random patches. Our experiments on the ISPRS Potsdam dataset demonstrate that CP2M substantially mitigates overfitting, setting new benchmarks for segmentation accuracy and model robustness in remote sensing tasks.
Related papers
- PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.
Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.
Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.
We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - Direct Cardiac Segmentation from Undersampled K-space Using Transformers [10.079819435628579]
We introduce a novel approach to deriving segmentations from sparse k-space samples using a transformer (DiSK)
Our model consistently outperforms the baselines in Dice and Hausdorff distances across foreground classes for all presented sampling rates.
arXiv Detail & Related papers (2024-05-31T20:54:12Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z) - Double Similarity Distillation for Semantic Image Segmentation [18.397968199629215]
We propose a knowledge distillation framework called double similarity distillation (DSD) to improve the classification accuracy of all existing compact networks.
Specifically, we propose a pixel-wise similarity distillation (PSD) module that utilizes residual attention maps to capture more detailed spatial dependencies.
Considering the differences in characteristics between semantic segmentation task and other computer vision tasks, we propose a category-wise similarity distillation (CSD) module.
arXiv Detail & Related papers (2021-07-19T02:45:13Z) - Robust Unsupervised Small Area Change Detection from SAR Imagery Using
Deep Learning [23.203687716051697]
A robust unsupervised approach is proposed for small area change detection from synthetic aperture radar (SAR) images.
A multi-scale superpixel reconstruction method is developed to generate a difference image (DI)
A two-stage centre-constrained fuzzy c-means clustering algorithm is proposed to divide the pixels of the DI into changed, unchanged and intermediate classes.
arXiv Detail & Related papers (2020-11-22T12:50:08Z) - Improving Scalability of Contrast Pattern Mining for Network Traffic
Using Closed Patterns [27.321487770162495]
Contrast pattern mining (CPM) aims to discover patterns whose support increases significantly from a background dataset compared to a target dataset.
In this paper, we focus on extracting the most specific set of CPs to discover significant changes between two datasets.
Our proposed unsupervised algorithm is up to 100 times faster than an existing approach for CPM on network traffic data.
arXiv Detail & Related papers (2020-11-16T08:52:47Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.