Related papers: DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation

DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation

URL: http://arxiv.org/abs/2305.19787v2
Date: Fri, 5 Jan 2024 10:29:59 GMT
Title: DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation
Authors: Xianwei Lv and Claudio Persello and Wangbin Li and Xiao Huang and Dongping Ming and Alfred Stein
Abstract summary: We propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images. This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG. DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods.
Score: 7.063322114865965
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posing limitations on the segmentation of large-scale remote sensing images and yielding algorithms with limited interpretability. To address the above challenges, we propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images by integrating deep learning and region adjacency graph (RAG). This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG. We propose a modified binary tree sampling method to generate shift-scale data, serving as inputs for transformer-based deep learning networks, a shift-scale attention with 3-Dimension relative position embedding to learn features across scales, and an embedding to fuse learned features with hand-crafted features. DeepMerge can achieve high segmentation accuracy in a supervised manner from large-scale remotely sensed images and provides an interpretable optimal scale parameter, which is validated using a remote sensing image of 0.55 m resolution covering an area of 5,660 km^2. The experimental results show that DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods.

Related papers

SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images [12.225954818454724]
Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon.<n>This article presents a stacked deep residual network (SDRNet) for semantic segmentation from FRRS images.
arXiv Detail & Related papers (2025-06-27T06:40:30Z)
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks. We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion [9.098711843118629]
This paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet) This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions. By incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features.
arXiv Detail & Related papers (2024-10-08T02:17:38Z)
Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge. Our method extracts low-level features from edges and textures to create a texture image. By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z)
Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet) Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z)
Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation [10.919956120261539]
High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. objects of the same category within HRS images show significant differences in scale and shape across diverse geographical environments. We propose a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs.
arXiv Detail & Related papers (2023-05-22T03:58:25Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning [55.762840052788945]
We present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery.
arXiv Detail & Related papers (2022-12-30T03:15:34Z)
Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers. Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR. We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
A Multi-Level Approach to Waste Object Segmentation [10.20384144853726]
We address the problem of localizing waste objects from a color image and an optional depth image. Our method integrates the intensity and depth information at multiple levels of spatial granularity. We create a new RGBD waste object segmentation, MJU-Waste, that is made public to facilitate future research in this area.
arXiv Detail & Related papers (2020-07-08T16:49:25Z)
DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations. To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.