Dual Atrous Separable Convolution for Improving Agricultural Semantic Segmentation
- URL: http://arxiv.org/abs/2506.22570v1
- Date: Fri, 27 Jun 2025 18:37:43 GMT
- Title: Dual Atrous Separable Convolution for Improving Agricultural Semantic Segmentation
- Authors: Chee Mei Ling, Thangarajah Akilan, Aparna Ravinda Phalke,
- Abstract summary: This study proposes an efficient image segmentation method for precision agriculture.<n>A novel Dual Atrous Separable Convolution (DAS Conv) module is integrated within the DeepLabV3-based segmentation framework.<n>It achieves more than 66% improvement in efficiency when considering the trade-off between model complexity and performance.
- Score: 2.3636539018632616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agricultural image semantic segmentation is a pivotal component of modern agriculture, facilitating accurate visual data analysis to improve crop management, optimize resource utilization, and boost overall productivity. This study proposes an efficient image segmentation method for precision agriculture, focusing on accurately delineating farmland anomalies to support informed decision-making and proactive interventions. A novel Dual Atrous Separable Convolution (DAS Conv) module is integrated within the DeepLabV3-based segmentation framework. The DAS Conv module is meticulously designed to achieve an optimal balance between dilation rates and padding size, thereby enhancing model performance without compromising efficiency. The study also incorporates a strategic skip connection from an optimal stage in the encoder to the decoder to bolster the model's capacity to capture fine-grained spatial features. Despite its lower computational complexity, the proposed model outperforms its baseline and achieves performance comparable to highly complex transformer-based state-of-the-art (SOTA) models on the Agriculture Vision benchmark dataset. It achieves more than 66% improvement in efficiency when considering the trade-off between model complexity and performance, compared to the SOTA model. This study highlights an efficient and effective solution for improving semantic segmentation in remote sensing applications, offering a computationally lightweight model capable of high-quality performance in agricultural imagery.
Related papers
- Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z) - LeMoRe: Learn More Details for Lightweight Semantic Segmentation [48.81126061219231]
We introduce an efficient paradigm by synergizing explicit and implicit modeling to balance computational efficiency with representational fidelity.<n>Our method combines well-defined Cartesian directions with explicitly modeled views and implicitly inferred intermediate representations, efficiently capturing global dependencies.
arXiv Detail & Related papers (2025-05-29T04:55:10Z) - AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification [0.0]
AdaptoVision is a novel convolutional neural network (CNN) architecture designed to efficiently balance computational complexity and classification accuracy.<n>By leveraging enhanced residual units, depth-wise separable convolutions, and hierarchical skip connections, AdaptoVision significantly reduces parameter count and computational requirements.<n>It achieves state-of-the-art on BreakHis dataset and comparable accuracy levels, notably 95.3% on CIFAR-10 and 85.77% on CIFAR-100, without relying on any pretrained weights.
arXiv Detail & Related papers (2025-04-17T05:23:07Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning [7.06787067270941]
We propose a novel framework that significantly reduces data volume while enhancing classification accuracy.<n>Our model employs a bidirectional reversed convolutional neural network (CNN) to efficiently extract spectral features, complemented by a specialized block for spatial feature analysis.
arXiv Detail & Related papers (2024-11-29T23:32:26Z) - Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features.
We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer.
We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z) - KonvLiNA: Integrating Kolmogorov-Arnold Network with Linear Nyström Attention for feature fusion in Crop Field Detection [0.0]
This study introduces KonvLiNA, a novel framework that integrates Convolutional Kolmogorov-Arnold Networks (cKAN) with Nystr"om attention mechanisms for effective crop field detection.
arXiv Detail & Related papers (2024-08-23T15:33:07Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to
Improve Segmentation Performance [61.04246102067351]
We propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic.
We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images.
arXiv Detail & Related papers (2023-07-02T10:39:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.