Applying Unsupervised Semantic Segmentation to High-Resolution UAV Imagery for Enhanced Road Scene Parsing
- URL: http://arxiv.org/abs/2402.02985v2
- Date: Sat, 27 Apr 2024 02:38:40 GMT
- Title: Applying Unsupervised Semantic Segmentation to High-Resolution UAV Imagery for Enhanced Road Scene Parsing
- Authors: Zihan Ma, Yongshang Li, Ronggui Ma, Chen Liang,
- Abstract summary: A novel unsupervised road parsing framework is presented.
The proposed method achieves a mean Intersection over Union (mIoU) of 89.96% on the development dataset without any manual annotation.
- Score: 12.558144256470827
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There are two challenges presented in parsing road scenes from UAV images: the complexity of processing high-resolution images and the dependency on extensive manual annotations required by traditional supervised deep learning methods to train robust and accurate models. In this paper, a novel unsupervised road parsing framework that leverages advancements in vision language models with fundamental computer vision techniques is introduced to address these critical challenges. Our approach initiates with a vision language model that efficiently processes ultra-high resolution images to rapidly identify road regions of interest. Subsequent application of the vision foundation model, SAM, generates masks for these regions without requiring category information. A self-supervised learning network then processes these masked regions to extract feature representations, which are clustered using an unsupervised algorithm that assigns unique IDs to each feature cluster. The masked regions are combined with the corresponding IDs to generate initial pseudo-labels, which initiate an iterative self-training process for regular semantic segmentation. Remarkably, the proposed method achieves a mean Intersection over Union (mIoU) of 89.96% on the development dataset without any manual annotation, demonstrating extraordinary flexibility by surpassing the limitations of human-defined categories, and autonomously acquiring knowledge of new categories from the dataset itself.
Related papers
- Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation [10.958014189747356]
We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS)
Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets.
It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS.
arXiv Detail & Related papers (2024-07-10T07:14:48Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models [33.690846523358836]
Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid pixel-level labels.
We focus on the WSSS with image-level labels, which is the most challenging form of WSSS.
We investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS.
arXiv Detail & Related papers (2023-10-19T07:16:54Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - IFSeg: Image-free Semantic Segmentation via Vision-Language Model [67.62922228676273]
We introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories.
We construct this artificial training data by creating a 2D map of random semantic categories and another map of their corresponding word tokens.
Our model not only establishes an effective baseline for this novel task but also demonstrates strong performances compared to existing methods.
arXiv Detail & Related papers (2023-03-25T08:19:31Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Unsupervised Person Re-identification via Simultaneous Clustering and
Consistency Learning [22.008371113710137]
We design a pretext task for unsupervised re-ID by learning visual consistency from still images and temporal consistency during training process.
We optimize the model by grouping the two encoded views into same cluster, thus enhancing the visual consistency between views.
arXiv Detail & Related papers (2021-04-01T02:10:42Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.