Related papers: Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery

Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery

URL: http://arxiv.org/abs/2410.07955v1
Date: Wed, 9 Oct 2024 13:19:26 GMT
Title: Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery
Authors: Ang He, Ximei Wu, Xing Xu, Jing Chen, Xiaobin Guo, Sheng Xu,
Abstract summary: We develop ALSS-YOLO-Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels. A Multi-Scale Channel Attention (MSCA) module combines multi-scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.
Score: 11.048503703669667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precise segmentation of Unmanned Aerial Vehicle (UAV)-captured images plays a vital role in tasks such as crop yield estimation and plant health assessment in banana plantations. By identifying and classifying planted areas, crop area can be calculated, which is indispensable for accurate yield predictions. However, segmenting banana plantation scenes requires a substantial amount of annotated data, and manual labeling of these images is both time-consuming and labor-intensive, limiting the development of large-scale datasets. Furthermore, challenges such as changing target sizes, complex ground backgrounds, limited computational resources, and correct identification of crop categories make segmentation even more difficult. To address these issues, we proposed a comprehensive solution. Firstly, we designed an iterative optimization annotation pipeline leveraging SAM2's zero-shot capabilities to generate high-quality segmentation annotations, thereby reducing the cost and time associated with data annotation significantly. Secondly, we developed ALSS-YOLO-Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels and optimize feature extraction, aiding accurate crop identification. Additionally, a Multi-Scale Channel Attention (MSCA) module combines multi-scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.

Related papers

DepthCropSeg++: Scaling a Crop Segmentation Foundation Model With Depth-Labeled Data [8.868203469534269]
DepthCropSeg++ is a foundation model for crop segmentation, capable of segmenting different crop species under open in-field environment.<n>We build upon a state-of-the-art semantic segmentation architecture ViT-Adapter architecture, enhance it with dynamic upAdapter architecture, and train the model with a two-stage selftraining pipeline.<n>Results demonstrate that DepthCropSeg++ achieves 93.11% moU on a comprehensive testing set, outperforming both supervised baselines and general vision foundation models.
arXiv Detail & Related papers (2026-01-18T11:51:09Z)
Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation [18.28485164485434]
Training an effective detection model requires gathering a large dataset with high-quality annotations from domain experts.<n>We build on the Segment Anything Model (SAM), a segmentation foundation model trained on natural images, and tailor it to Sentinel-1 SAR data.<n>We tackle these challenges through a combination of adapters to mitigate the domain gap, multiple encoders to handle multi-channel SAR inputs, prompt-engineering strategies to improve avalanche localization accuracy, and a training algorithm that limits the training time of the encoder.
arXiv Detail & Related papers (2026-01-03T15:41:12Z)
Unlocking Zero-Shot Plant Segmentation with Pl@ntNet Intelligence [3.7603674895765766]
We present a zero-shot segmentation approach for agricultural imagery.<n>Our method exploits Plantnet's specialized plant representations to identify plant regions.<n>We show consistent performance gains when using Plantnet-fine-tuned DinoV2 over the base DinoV2 model.
arXiv Detail & Related papers (2025-10-14T14:38:32Z)
PlantSAM: An Object Detection-Driven Segmentation Pipeline for Herbarium Specimens [0.5339846068056558]
We introduce PlantSAM, an automated segmentation pipeline that integrates YOLOv10 for plant region detection and the Segment Anything Model (SAM2) for segmentation.<n>YOLOv10 generates bounding box prompts to guide SAM2, enhancing segmentation accuracy.<n>PlantSAM achieved state-of-the-art segmentation performance, with an IoU of 0.94 and a Dice coefficient of 0.97.
arXiv Detail & Related papers (2025-07-22T12:02:39Z)
FAMSeg: Fetal Femur and Cranial Ultrasound Segmentation Using Feature-Aware Attention and Mamba Enhancement [3.307520405211055]
This paper proposes a fetal femur and cranial ultrasound image segmentation model based on feature perception and Mamba enhancement.<n>The FAMSeg network achieved the fastest loss reduction and the best segmentation performance across images of varying sizes and orientations.
arXiv Detail & Related papers (2025-06-09T05:06:47Z)
Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation [49.13393683126712]
Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities.<n> accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes.<n>We propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images.
arXiv Detail & Related papers (2025-05-21T03:57:10Z)
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models. By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation. Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z)
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Improving Data Efficiency for Plant Cover Prediction with Label Interpolation and Monte-Carlo Cropping [7.993547048820065]
The plant community composition is an essential indicator of environmental changes and is usually analyzed in ecological field studies. We introduce an approach to interpolate the sparse labels in the collected vegetation plot time series down to the intermediate dense and unlabeled images. We also introduce a new method we call Monte-Carlo Cropping to deal with high-resolution images efficiently.
arXiv Detail & Related papers (2023-07-17T15:17:39Z)
SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution [3.717258819781834]
We propose a novel Deep learning approach that integrates HRNet with Separable Convolutional layers to capture spatial patterns and Self-attention to capture temporal patterns of the data. The proposed algorithm achieves a high classification accuracy of 97.5% and IoU of 55.2% in generating crop maps.
arXiv Detail & Related papers (2023-07-11T18:07:25Z)
Agave crop segmentation and maturity classification with deep learning data-centric strategies using very high-resolution satellite imagery [101.18253437732933]
We present an Agave tequilana Weber azul crop segmentation and maturity classification using very high resolution satellite imagery. We solve real-world deep learning problems in the very specific context of agave crop segmentation. With the resulting accurate models, agave production forecasting can be made available for large regions.
arXiv Detail & Related papers (2023-03-21T03:15:29Z)
AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation. We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z)
Split-Merge Pooling [36.2980225204665]
Split-Merge pooling is introduced to preserve spatial information without subsampling. We evaluate our approach for dense semantic segmentation of large image sizes taken from the Cityscapes and GTA-5 datasets.
arXiv Detail & Related papers (2020-06-13T23:20:30Z)
Super Resolution for Root Imaging [2.0924876102146714]
Super-resolution (SR) algorithms are desired for overcoming resolution limitations of sensors, reducing storage space requirements, and boosting the performance of later analysis. We propose a SR framework for enhancing images of plant roots by using convolutional neural networks (CNNs) We demonstrate on a collection of publicly available datasets that the SR models outperform the basic bicubic even when trained with non-root datasets.
arXiv Detail & Related papers (2020-03-30T15:11:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.