Related papers: Modular Transformer Architecture for Precision Agriculture Imaging

Modular Transformer Architecture for Precision Agriculture Imaging

URL: http://arxiv.org/abs/2508.03751v2
Date: Thu, 07 Aug 2025 04:30:53 GMT
Title: Modular Transformer Architecture for Precision Agriculture Imaging
Authors: Brian Gopalan, Nathalia Nascimento, Vishal Monga,
Abstract summary: This paper addresses the need for efficient and accurate weed segmentation from drone video in precision agriculture.<n>A quality-aware modular deep-learning framework is proposed that addresses common image degradation.
Score: 13.182388658918498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the critical need for efficient and accurate weed segmentation from drone video in precision agriculture. A quality-aware modular deep-learning framework is proposed that addresses common image degradation by analyzing quality conditions-such as blur and noise-and routing inputs through specialized pre-processing and transformer models optimized for each degradation type. The system first analyzes drone images for noise and blur using Mean Absolute Deviation and the Laplacian. Data is then dynamically routed to one of three vision transformer models: a baseline for clean images, a modified transformer with Fisher Vector encoding for noise reduction, or another with an unrolled Lucy-Richardson decoder to correct blur. This novel routing strategy allows the system to outperform existing CNN-based methods in both segmentation quality and computational efficiency, demonstrating a significant advancement in deep-learning applications for agriculture.

Related papers

TDiR: Transformer based Diffusion for Image Restoration Tasks [19.992144590243836]
Images captured in challenging environments often experience various forms of degradation, including noise, color cast, blur, and light scattering.<n>These effects significantly reduce image quality, hindering their applicability in downstream tasks such as object detection, mapping, and classification.<n>Our transformer-based diffusion model was developed to address image restoration tasks, aiming to improve the quality of degraded images.
arXiv Detail & Related papers (2025-06-25T10:28:13Z)
A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning [0.12499537119440242]
A lightweight transformer architecture is proposed to reduce the dimensionality of the encoder layers and employ a distilled version of GPT-2 as the decoder.<n>A knowledge distillation strategy is used to transfer knowledge from a more complex teacher model to improve the performance of the lightweight network.<n> Experimental results demonstrate that the proposed approach significantly improves caption quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-11T06:24:02Z)
Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation.<n>Despite their robust generative capabilities, these models often struggle with inaccuracies.<n>We propose RF-r, a training-free sampler that effectively enhances inversion precision by mitigating the errors in the inversion process of rectified flow.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks. We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT) DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
Image Reconstruction using Enhanced Vision Transformer [0.08594140167290097]
We propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting. The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings. We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability.
arXiv Detail & Related papers (2023-07-11T02:14:18Z)
Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging [7.601695814245209]
We consider the problem of video snapshot compressive imaging (SCI), where sequential high-speed frames are modulated by different masks and captured by a single measurement. By combining optimization algorithms and neural networks, deep unfolding networks (DUNs) score tremendous achievements in solving inverse problems.
arXiv Detail & Related papers (2023-06-20T06:25:48Z)
Computational Optics for Mobile Terminals in Mass Production [17.413494778377565]
We construct the perturbed lens system model to illustrate the relationship between the system parameters and the deviated frequency response measured from photographs. An optimization framework is proposed based on this model to build proxy cameras from the machining samples' SFRs. Engaging with the proxy cameras, we synthetic data pairs, which encode the optical aberrations and the random manufacturing biases, for training the aberration-based algorithms.
arXiv Detail & Related papers (2023-05-10T04:17:33Z)
Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution [51.274657266928315]
We propose a PSF aware plug-and-play deep network, which takes the aberrant image and PSF map as input and produces the latent high quality version via incorporating lens-specific deep priors. Specifically, we pre-train a base model from a set of diverse lenses and then adapt it to a given lens by quickly refining the parameters.
arXiv Detail & Related papers (2021-04-07T12:00:38Z)
SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same. We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters. Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline [100.5353614588565]
We propose to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization. We demonstrate that the proposed method performs favorably against state-of-the-art single-image HDR reconstruction algorithms.
arXiv Detail & Related papers (2020-04-02T17:59:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.