Related papers: Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture

Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture

URL: http://arxiv.org/abs/2505.07444v1
Date: Mon, 12 May 2025 11:08:42 GMT
Title: Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture
Authors: Zeynep Galymzhankyzy, Eric Martinson,
Abstract summary: CNN-based methods struggle to generalize and rely on RGB imagery, limiting performance under complex field conditions.<n>We propose a lightweight transformer-CNN hybrid that processes RGB, Near-Infrared (NIR), and Red-Edge (RE) bands using specialized encoders and dynamic modality integration.<n>The model achieves a segmentation accuracy (mean IoU) of 78.88%, outperforming RGB-only models by 15.8 percentage points.
Score: 1.2277343096128712
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient crop-weed segmentation is critical for site-specific weed control in precision agriculture. Conventional CNN-based methods struggle to generalize and rely on RGB imagery, limiting performance under complex field conditions. To address these challenges, we propose a lightweight transformer-CNN hybrid. It processes RGB, Near-Infrared (NIR), and Red-Edge (RE) bands using specialized encoders and dynamic modality integration. Evaluated on the WeedsGalore dataset, the model achieves a segmentation accuracy (mean IoU) of 78.88%, outperforming RGB-only models by 15.8 percentage points. With only 8.7 million parameters, the model offers high accuracy, computational efficiency, and potential for real-time deployment on Unmanned Aerial Vehicles (UAVs) and edge devices, advancing precision weed management.

Related papers

RGB-Event Fusion with Self-Attention for Collision Prediction [9.268995547414777]
This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object.<n>The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy.<n>We show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1% on average and 10% for distances beyond 0.5m, but comes at the cost of +71% in memory and + 105% in FLOPs.
arXiv Detail & Related papers (2025-05-07T09:03:26Z)
SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems [1.891522135443594]
Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation.<n>In inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models.<n>We propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap.
arXiv Detail & Related papers (2025-04-22T09:22:11Z)
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks.<n>We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts.<n>We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
WeedsGalore: A Multispectral and Multitemporal UAV-based Dataset for Crop and Weed Segmentation in Agricultural Maize Fields [0.7421845364041001]
Weeds are one of the major reasons for crop yield loss but current weeding practices fail to manage weeds in an efficient and targeted manner.<n>We present a novel dataset for semantic and instance segmentation of crops and weeds in agricultural maize fields.
arXiv Detail & Related papers (2025-02-18T18:13:19Z)
Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z)
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data. We propose an augmentation technique called "Organ Transplantation" to enhance generalizability. Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z)
LHU-Net: A Light Hybrid U-Net for Cost-Efficient, High-Performance Volumetric Medical Image Segmentation [4.168081528698768]
We introduce LHU-Net, a streamlined Hybrid U-Net for medical image segmentation. Tested on five benchmark datasets, LHU-Net demonstrated superior efficiency and accuracy.
arXiv Detail & Related papers (2024-04-07T22:58:18Z)
Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets [0.0]
This study proposes a visual machine learning approach for plant disease detection, harnessing RGB and NIR data collected in real-world conditions through a JAI FS-1600D-10GE camera to build an RGBN dataset. A two-stage early plant disease detection model with YOLOv8 and a sequential CNN was used to train on a dataset with partial labels, which showed a 3.6% increase in mAP compared to a single-stage end-to-end segmentation model. An average of 6.25% validation accuracy increase is found using RGBN in classification compared to RGB using ResNet15 and the sequential CNN models.
arXiv Detail & Related papers (2024-02-12T18:57:06Z)
Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data. We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z)
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation [9.198120596225968]
We propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm. Experimental results on NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time, and parameters than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-11T09:02:03Z)
CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement. Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.