Fruit Deformity Classification through Single-Input and Multi-Input Architectures based on CNN Models using Real and Synthetic Images
- URL: http://arxiv.org/abs/2412.12966v1
- Date: Tue, 17 Dec 2024 14:51:13 GMT
- Title: Fruit Deformity Classification through Single-Input and Multi-Input Architectures based on CNN Models using Real and Synthetic Images
- Authors: Tommy D. Beltran, Raul J. Villao, Luis E. Chuquimarca, Boris X. Vintimilla, Sergio A. Velastin,
- Abstract summary: The present study focuses on detecting the degree of deformity in fruits such as apples, mangoes, and strawberries during the process of inspecting their external quality.
The datasets are segmented using the Segment Anything Model (SAM), which provides the silhouette of the fruits.
The results revealed that the Multi-Input architecture with the MobileNetV2 model was the most effective in identifying deformities in the fruits.
- Score: 2.1534273328102937
- License:
- Abstract: The present study focuses on detecting the degree of deformity in fruits such as apples, mangoes, and strawberries during the process of inspecting their external quality, employing Single-Input and Multi-Input architectures based on convolutional neural network (CNN) models using sets of real and synthetic images. The datasets are segmented using the Segment Anything Model (SAM), which provides the silhouette of the fruits. Regarding the single-input architecture, the evaluation of the CNN models is performed only with real images, but a methodology is proposed to improve these results using a pre-trained model with synthetic images. In the Multi-Input architecture, branches with RGB images and fruit silhouettes are implemented as inputs for evaluating CNN models such as VGG16, MobileNetV2, and CIDIS. However, the results revealed that the Multi-Input architecture with the MobileNetV2 model was the most effective in identifying deformities in the fruits, achieving accuracies of 90\%, 94\%, and 92\% for apples, mangoes, and strawberries, respectively. In conclusion, the Multi-Input architecture with the MobileNetV2 model is the most accurate for classifying levels of deformity in fruits.
Related papers
- Classifying Healthy and Defective Fruits with a Multi-Input Architecture and CNN Models [0.0]
The primary aim is to enhance the accuracy of CNN models.
Results reveal that the inclusion of silhouette images alongside the Multi-Input architecture yields models with superior performance.
arXiv Detail & Related papers (2024-10-14T21:37:12Z) - Convolutional Neural Network Ensemble Learning for Hyperspectral
Imaging-based Blackberry Fruit Ripeness Detection in Uncontrolled Farm
Environment [4.292727554656705]
This paper proposes a novel multi-input convolutional neural network (CNN) ensemble classifier for detecting subtle traits of ripeness in blackberry fruits.
The proposed model achieved 95.1% accuracy on unseen sets and 90.2% accuracy with in-field conditions.
arXiv Detail & Related papers (2024-01-09T12:00:17Z) - An Improved CNN-based Neural Network Model for Fruit Sugar Level Detection [24.07349410158827]
We design a regression model for fruit sugar level estimation using an Artificial Neural Network (ANN) based on the visible/near-infrared (V/NIR) spectra of fruits.
Using fruit sugar levels as the detection target, we collected data from two fruit types, Gan Nan Navel and Tian Shan Pear, and conducted experiments to compare their results.
arXiv Detail & Related papers (2023-11-18T17:07:25Z) - Facilitated machine learning for image-based fruit quality assessment in
developing countries [68.8204255655161]
Automated image classification is a common task for supervised machine learning in food science.
We propose an alternative method based on pre-trained vision transformers (ViTs)
It can be easily implemented with limited resources on a standard device.
arXiv Detail & Related papers (2022-07-10T19:52:20Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - Making CNNs Interpretable by Building Dynamic Sequential Decision
Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable.
We achieve this by building a differentiable decision forest on top of CNNs.
We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z) - Measuring the Ripeness of Fruit with Hyperspectral Imaging and Deep
Learning [14.853897011640022]
We present a system to measure the ripeness of fruit with a hyperspectral camera and a suitable deep neural network architecture.
This architecture did outperform competitive baseline models on the prediction of the state of ripeness.
arXiv Detail & Related papers (2021-04-20T07:43:19Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Incorporating Image Gradients as Secondary Input Associated with Input
Image to Improve the Performance of the CNN Model [0.0]
In existing CNN architectures, only single form of given input is fed to the network.
New architecture has been proposed where given input is passed in more than one form to the network simultaneously.
arXiv Detail & Related papers (2020-06-05T14:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.