CornViT: A Multi-Stage Convolutional Vision Transformer Framework for Hierarchical Corn Kernel Analysis
- URL: http://arxiv.org/abs/2601.00897v1
- Date: Wed, 31 Dec 2025 16:16:48 GMT
- Title: CornViT: A Multi-Stage Convolutional Vision Transformer Framework for Hierarchical Corn Kernel Analysis
- Authors: Sai Teja Erukude, Jane Mascarenhas, Lior Shamir,
- Abstract summary: Accurate grading of corn kernels is critical for seed certification, directional seeding, and breeding.<n>This work introduces CornViT, a three-stage Convolutional Vision Transformer framework that emulates the hierarchical reasoning of human seed analysts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate grading of corn kernels is critical for seed certification, directional seeding, and breeding, yet it is still predominantly performed by manual inspection. This work introduces CornViT, a three-stage Convolutional Vision Transformer (CvT) framework that emulates the hierarchical reasoning of human seed analysts for single-kernel evaluation. Three sequential CvT-13 classifiers operate on 384x384 RGB images: Stage 1 distinguishes pure from impure kernels; Stage 2 categorizes pure kernels into flat and round morphologies; and Stage 3 determines the embryo orientation (up vs. down) for pure, flat kernels. Starting from a public corn seed image collection, we manually relabeled and filtered images to construct three stage-specific datasets: 7265 kernels for purity, 3859 pure kernels for morphology, and 1960 pure-flat kernels for embryo orientation, all released as benchmarks. Head-only fine-tuning of ImageNet-22k pretrained CvT-13 backbones yields test accuracies of 93.76% for purity, 94.11% for shape, and 91.12% for embryo-orientation detection. Under identical training conditions, ResNet-50 reaches only 76.56 to 81.02 percent, whereas DenseNet-121 attains 86.56 to 89.38 percent accuracy. These results highlight the advantages of convolution-augmented self-attention for kernel analysis. To facilitate adoption, we deploy CornViT in a Flask-based web application that performs stage-wise inference and exposes interpretable outputs through a browser interface. Together, the CornViT framework, curated datasets, and web application provide a deployable solution for automated corn kernel quality assessment in seed quality workflows. Source code and data are publicly available.
Related papers
- Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for
Medical Image Segmentation [25.62587471067468]
RepUX-Net is a pure CNN architecture with a simple large kernel block design.
Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting.
arXiv Detail & Related papers (2023-03-10T08:38:34Z) - Which Pixel to Annotate: a Label-Efficient Nuclei Segmentation Framework [70.18084425770091]
Deep neural networks have been widely applied in nuclei instance segmentation of H&E stained pathology images.
It is inefficient and unnecessary to label all pixels for a dataset of nuclei images which usually contain similar and redundant patterns.
We propose a novel full nuclei segmentation framework that chooses only a few image patches to be annotated, augments the training set from the selected samples, and achieves nuclei segmentation in a semi-supervised manner.
arXiv Detail & Related papers (2022-12-20T14:53:26Z) - Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision.
We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture.
Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Integrating Circle Kernels into Convolutional Neural Networks [30.950819638148104]
The square kernel is a standard unit for contemporary Convolutional Neural Networks (CNNs)
We propose using circle kernels with isotropic receptive fields for the convolution.
Our training takes approximately equivalent amount of calculation when compared with the corresponding CNN with square kernels.
arXiv Detail & Related papers (2021-07-06T07:59:36Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - A Deep Learning Approach Based on Graphs to Detect Plantation Lines [16.76043873454695]
We propose a deep learning approach based on graphs to detect plantation lines in UAV-based RGB imagery.
The proposed method was compared against state-of-the-art deep learning methods.
It achieved superior performance with a significant margin, returning precision, recall, and F1-score of 98.7%, 91.9%, and 95.1%, respectively.
arXiv Detail & Related papers (2021-02-05T14:56:42Z) - A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows
from UAV Imagery [56.10033255997329]
We propose a novel deep learning method based on a Convolutional Neural Network (CNN)
It simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations.
The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.
arXiv Detail & Related papers (2020-12-31T18:51:17Z) - Pollen Grain Microscopic Image Classification Using an Ensemble of
Fine-Tuned Deep Convolutional Neural Networks [2.824133171517646]
We present an ensemble approach for pollen grain microscopic image classification into four categories.
We develop a classification strategy that is based on fusion of four state-of-the-art fine-tuned convolutional neural networks.
We obtain an accuracy of 94.48% and a weighted F1-score of 94.54% on the ICPR 2020 Pollen Grain Classification Challenge training dataset.
arXiv Detail & Related papers (2020-11-15T01:25:46Z) - Convolutional Neural Networks for Image-based Corn Kernel Detection and
Counting [22.070850942573863]
A well developed corn ear can have up to 800 kernels.
manually counting the kernels on an ear of corn is labor-intensive and prone to human error.
We propose a kernel detection and counting method based on a sliding window approach.
arXiv Detail & Related papers (2020-03-26T16:46:23Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.