Related papers: Interactive segmentation in aerial images: a new benchmark and an open access web-based tool

Interactive segmentation in aerial images: a new benchmark and an open access web-based tool

URL: http://arxiv.org/abs/2308.13174v2
Date: Thu, 7 Mar 2024 06:10:02 GMT
Title: Interactive segmentation in aerial images: a new benchmark and an open access web-based tool
Authors: Zhe Wang, Shoukun Sun, Xiang Que, Xiaogang Ma
Abstract summary: In recent years, interactive semantic segmentation proposed in computer vision has achieved an ideal state of human-computer interaction segmentation. This study aims to bridge the gap between interactive segmentation and remote sensing analysis by conducting benchmark study on various interactive segmentation models.
Score: 2.729446374377189
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep learning has gradually become powerful in segmenting and classifying aerial images. However, in remote sensing applications, the lack of training datasets and the difficulty of accuracy assessment have always been challenges for the deep learning based classification. In recent years, interactive semantic segmentation proposed in computer vision has achieved an ideal state of human-computer interaction segmentation. It can provide expert experience and utilize deep learning for efficient segmentation. However, few papers discussed its application in remote sensing imagery. This study aims to bridge the gap between interactive segmentation and remote sensing analysis by conducting a benchmark study on various interactive segmentation models. We assessed the performance of five state-of-the-art interactive segmentation methods (Reviving Iterative Training with Mask Guidance for Interactive Segmentation (RITM), FocalClick, SimpleClick, Iterative Click Loss (ICL), and Segment Anything (SAM)) on two high-resolution aerial imagery datasets. The Cascade-Forward Refinement approach, an innovative inference strategy for interactive segmentation, was also introduced to enhance the segmentation results. We evaluated these methods on various land cover types, object sizes, and band combinations in the datasets. SimpleClick model consistently outperformed the other methods in our experiments. Conversely, the SAM performed less effectively than other models. Building upon these findings, we developed an online tool called RSISeg for interactive segmentation of remote sensing data. RSISeg incorporates a well-performing interactive model that is finetuned with remote sensing data. Compared to existing interactive segmentation tools, RSISeg offers robust interactivity, modifiability, and adaptability to remote sensing data.

Related papers

RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation [37.44155289954746]
We conduct a large crowdsourcing study of click patterns in an interactive segmentation scenario and collect 475K real-user clicks. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust.
arXiv Detail & Related papers (2024-10-15T15:55:00Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation [13.38174941551702]
We introduce a new dataset containing instance segmentation masks for ten different categories of winter sports equipment. We carry out interactive segmentation experiments on said dataset to explore possibilities for efficient further labeling.
arXiv Detail & Related papers (2024-07-12T14:20:12Z)
Learning from Exemplars for Interactive Image Segmentation [15.37506525730218]
We introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category. Our model reduces users' labor by around 15%, requiring two fewer clicks to achieve target IoUs 85% and 90%.
arXiv Detail & Related papers (2024-06-17T12:38:01Z)
Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT) We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information. Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z)
TETRIS: Towards Exploring the Robustness of Interactive Segmentation [39.1981941213761]
We propose a methodology for finding extreme user inputs by a direct optimization in a white-box adversarial attack on the interactive segmentation model. We report the results of an extensive evaluation of dozens of models.
arXiv Detail & Related papers (2024-02-09T01:36:21Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
RAIS: Robust and Accurate Interactive Segmentation via Continual Learning [16.382862088005087]
We propose RAIS, a robust and accurate architecture for interactive segmentation with continuous learning. For efficient learning on the test set, we propose a novel optimization strategy to update global and local parameters. Our method also shows its robustness in the datasets of remote sensing and medical imaging.
arXiv Detail & Related papers (2022-10-20T03:05:44Z)
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations. Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval. We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z)
Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps [55.94785248905853]
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
arXiv Detail & Related papers (2021-04-21T07:08:57Z)
Reviving Iterative Training with Mask Guidance for Interactive Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z)
A Graph-based Interactive Reasoning for Human-Object Interaction Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs. We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks. We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.