Related papers: Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models

Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models

URL: http://arxiv.org/abs/2501.14001v1
Date: Thu, 23 Jan 2025 12:12:31 GMT
Title: Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models
Authors: Ioannis Nasios,
Abstract summary: This study explores the integration of crowdsourced labels with advanced artificial intelligence models to develop a fast and accurate kelp canopy detection pipeline.<n>The methodology achieved a high detection rate, accurately identifying about three out of four pixels containing kelp canopy.<n>This work also underscores the potential of combining machine learning models with crowdsourced data for effective and scalable environmental monitoring.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Kelp forests, as foundation species, are vital to marine ecosystems, providing essential food and habitat for numerous organisms. This study explores the integration of crowdsourced labels with advanced artificial intelligence models to develop a fast and accurate kelp canopy detection pipeline using Landsat images. Building on the success of a machine learning competition, where this approach ranked third and performed consistently well on both local validation and public and private leaderboards, the research highlights the effectiveness of combining Mixed Vision Transformers (MIT) with ConvNeXt models. Training these models on various image sizes significantly enhanced the accuracy of the ensemble results. U-Net emerged as the best segmentation architecture, with UpperNet also contributing to the final ensemble. Key Landsat bands, such as ShortWave InfraRed (SWIR1) and Near-InfraRed (NIR), were crucial while altitude data was used in postprocessing to eliminate false positives on land. The methodology achieved a high detection rate, accurately identifying about three out of four pixels containing kelp canopy while keeping false positives low. Despite the medium resolution of Landsat satellites, their extensive historical coverage makes them effective for studying kelp forests. This work also underscores the potential of combining machine learning models with crowdsourced data for effective and scalable environmental monitoring. All running code for training all models and inference can be found at https://github.com/IoannisNasios/Kelp_Forests.

Related papers

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection. CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z)
Evaluation of Deep Learning Semantic Segmentation for Land Cover Mapping on Multispectral, Hyperspectral and High Spatial Aerial Imagery [0.0]
In the rise of climate change, land cover mapping has become such an urgent need in environmental monitoring. This research implemented a semantic segmentation method such as Unet, Linknet, FPN, and PSPnet for categorizing vegetation, water, and others. The LinkNet model obtained high accuracy in IoU at 0.92 in all datasets, which is comparable with other mentioned techniques.
arXiv Detail & Related papers (2024-06-20T11:40:12Z)
Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model. Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Deep-Learning Framework for Optimal Selection of Soil Sampling Sites [0.0]
This work leverages the recent advancements of deep learning in image processing to find optimal locations that present the important characteristics of a field. Our framework is constructed with an encoder-decoder architecture with the self-attention mechanism as the backbone. The model has achieved impressive results on the testing dataset, with a mean accuracy of 99.52%, a mean Intersection over Union (IoU) of 57.35%, and a mean Dice Coefficient of 71.47%.
arXiv Detail & Related papers (2023-09-02T16:19:21Z)
SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution [3.717258819781834]
We propose a novel Deep learning approach that integrates HRNet with Separable Convolutional layers to capture spatial patterns and Self-attention to capture temporal patterns of the data. The proposed algorithm achieves a high classification accuracy of 97.5% and IoU of 55.2% in generating crop maps.
arXiv Detail & Related papers (2023-07-11T18:07:25Z)
Classification of Single Tree Decay Stages from Combined Airborne LiDAR Data and CIR Imagery [1.4589991363650008]
This study, for the first time, automatically categorizing individual trees (Norway spruce) into five decay stages. Three different Machine Learning methods - 3D point cloud-based deep learning (KPConv), Convolutional Neural Network (CNN), and Random Forest (RF) All models achieved promising results, reaching overall accuracy (OA) of up to 88.8%, 88.4% and 85.9% for KPConv, CNN and RF, respectively.
arXiv Detail & Related papers (2023-01-04T22:20:16Z)
Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA) We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z)
Efficient Self-supervised Vision Transformers for Representation Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity. We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies. Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z)
Instance segmentation of fallen trees in aerial color infrared imagery using active multi-contour evolution with fully convolutional network-based intensity priors [0.5276232626689566]
We introduce a framework for segmenting instances of a common object class by multiple active contour evolution over semantic segmentation maps of images. We instantiate the proposed framework in the context of segmenting individual fallen stems from high-resolution aerial multispectral imagery.
arXiv Detail & Related papers (2021-05-05T11:54:05Z)
GANav: Group-wise Attention Network for Classifying Navigable Regions in Unstructured Outdoor Environments [54.21959527308051]
We present a new learning-based method for identifying safe and navigable regions in off-road terrains and unstructured environments from RGB images. Our approach consists of classifying groups of terrain classes based on their navigability levels using coarse-grained semantic segmentation. We show through extensive evaluations on the RUGD and RELLIS-3D datasets that our learning algorithm improves the accuracy of visual perception in off-road terrains for navigation.
arXiv Detail & Related papers (2021-03-07T02:16:24Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information. To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy. We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.