Enlisting 3D Crop Models and GANs for More Data Efficient and
Generalizable Fruit Detection
- URL: http://arxiv.org/abs/2108.13344v1
- Date: Mon, 30 Aug 2021 16:11:59 GMT
- Title: Enlisting 3D Crop Models and GANs for More Data Efficient and
Generalizable Fruit Detection
- Authors: Zhenghao Fei, Alex Olenskyj, Brian N. Bailey, Mason Earles
- Abstract summary: We propose a method that generates agricultural images from a synthetic 3D crop model domain into real world crop domains.
The method uses a semantically constrained GAN (generative adversarial network) to preserve the fruit position and geometry.
Incremental training experiments in vineyard grape detection tasks show that the images generated from our method can significantly speed the domain process.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training real-world neural network models to achieve high performance and
generalizability typically requires a substantial amount of labeled data,
spanning a broad range of variation. This data-labeling process can be both
labor and cost intensive. To achieve desirable predictive performance, a
trained model is typically applied into a domain where the data distribution is
similar to the training dataset. However, for many agricultural machine
learning problems, training datasets are collected at a specific location,
during a specific period in time of the growing season. Since agricultural
systems exhibit substantial variability in terms of crop type, cultivar,
management, seasonal growth dynamics, lighting condition, sensor type, etc, a
model trained from one dataset often does not generalize well across domains.
To enable more data efficient and generalizable neural network models in
agriculture, we propose a method that generates photorealistic agricultural
images from a synthetic 3D crop model domain into real world crop domains. The
method uses a semantically constrained GAN (generative adversarial network) to
preserve the fruit position and geometry. We observe that a baseline CycleGAN
method generates visually realistic target domain images but does not preserve
fruit position information while our method maintains fruit positions well.
Image generation results in vineyard grape day and night images show the visual
outputs of our network are much better compared to a baseline network.
Incremental training experiments in vineyard grape detection tasks show that
the images generated from our method can significantly speed the domain
adaption process, increase performance for a given number of labeled images
(i.e. data efficiency), and decrease labeling requirements.
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models [36.59260354292177]
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models.
We aim to fine-tune vision-language models to a specific classification model without access to any real images.
Despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets.
arXiv Detail & Related papers (2024-06-08T10:43:49Z) - Few-Shot Fruit Segmentation via Transfer Learning [4.616529139444651]
We develop a few-shot semantic segmentation framework for infield fruits using transfer learning.
Motivated by similar success in urban scene parsing, we propose specialized pre-training.
We show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground.
arXiv Detail & Related papers (2024-05-04T04:05:59Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Diversify Your Vision Datasets with Automatic Diffusion-Based
Augmentation [66.6546668043249]
ALIA (Automated Language-guided Image Augmentation) is a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains.
To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information.
We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks.
arXiv Detail & Related papers (2023-05-25T17:43:05Z) - Inside Out: Transforming Images of Lab-Grown Plants for Machine Learning
Applications in Agriculture [0.0]
We employ a contrastive unpaired translation (CUT) generative adversarial network (GAN) to translate indoor plant images to appear as field images.
While we train our network to translate an image containing only a single plant, we show that our method is easily extendable to produce multiple-plant field images.
We also use our synthetic multi-plant images to train several YoloV5 nano object detection models to perform the task of plant detection.
arXiv Detail & Related papers (2022-11-05T20:51:45Z) - End-to-end deep learning for directly estimating grape yield from
ground-based imagery [53.086864957064876]
This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards.
Three model architectures were tested: object detection, CNN regression, and transformer models.
The study showed the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale.
arXiv Detail & Related papers (2022-08-04T01:34:46Z) - Facilitated machine learning for image-based fruit quality assessment in
developing countries [68.8204255655161]
Automated image classification is a common task for supervised machine learning in food science.
We propose an alternative method based on pre-trained vision transformers (ViTs)
It can be easily implemented with limited resources on a standard device.
arXiv Detail & Related papers (2022-07-10T19:52:20Z) - Generative Adversarial Networks for Image Augmentation in Agriculture: A
Systematic Review [5.639656362091594]
generative adversarial network (GAN) invented in 2014 in the computer vision community, provides suite of novel approaches that can learn good data representations.
This paper presents an overview of the evolution of GAN architectures followed by a systematic review of their application to agriculture.
arXiv Detail & Related papers (2022-04-10T15:33:05Z) - Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision
Farming [3.4788711710826083]
We propose an alternative solution with respect to the common data augmentation methods, applying it to the problem of crop/weed segmentation in precision farming.
We create semi-artificial samples by replacing the most relevant object classes (i.e., crop and weeds) with their synthesized counterparts.
In addition to RGB data, we take into account also near-infrared (NIR) information, generating four channel multi-spectral synthetic images.
arXiv Detail & Related papers (2020-09-12T08:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.