FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework
- URL: http://arxiv.org/abs/2408.06190v2
- Date: Thu, 26 Sep 2024 07:56:50 GMT
- Title: FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework
- Authors: Lukas Meyer, Andreas Gilson, Ute Schmid, Marc Stamminger,
- Abstract summary: We introduce FruitNeRF, a unified novel fruit counting framework.
We use state-of-the-art view synthesis methods to count any fruit type directly in 3D.
We evaluate our methodology using both real-world and synthetic datasets.
- Score: 5.363729942767801
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D. Our framework takes an unordered set of posed images captured by a monocular camera and segments fruit in each image. To make our system independent of the fruit type, we employ a foundation model that generates binary segmentation masks for any fruit. Utilizing both modalities, RGB and semantic, we train a semantic neural radiance field. Through uniform volume sampling of the implicit Fruit Field, we obtain fruit-only point clouds. By applying cascaded clustering on the extracted point cloud, our approach achieves precise fruit count.The use of neural radiance fields provides significant advantages over conventional methods such as object tracking or optical flow, as the counting itself is lifted into 3D. Our method prevents double counting fruit and avoids counting irrelevant fruit.We evaluate our methodology using both real-world and synthetic datasets. The real-world dataset consists of three apple trees with manually counted ground truths, a benchmark apple dataset with one row and ground truth fruit location, while the synthetic dataset comprises various fruit types including apple, plum, lemon, pear, peach, and mango.Additionally, we assess the performance of fruit counting using the foundation model compared to a U-Net.
Related papers
- Few-Shot Fruit Segmentation via Transfer Learning [4.616529139444651]
We develop a few-shot semantic segmentation framework for infield fruits using transfer learning.
Motivated by similar success in urban scene parsing, we propose specialized pre-training.
We show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground.
arXiv Detail & Related papers (2024-05-04T04:05:59Z) - A pipeline for multiple orange detection and tracking with 3-D fruit
relocalization and neural-net based yield regression in commercial citrus
orchards [0.0]
We propose a non-invasive alternative that utilizes fruit counting from videos, implemented as a pipeline.
To handle occluded and re-appeared fruit, we introduce a relocalization component that employs 3-D estimation of fruit locations.
By ensuring that at least 30% of the fruit is accurately detected, tracked, and counted, our yield regressor achieves an impressive coefficient of determination of 0.85.
arXiv Detail & Related papers (2023-12-27T21:22:43Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z) - Apple Counting using Convolutional Neural Networks [22.504279159923765]
Estimating accurate and reliable fruit and vegetable counts from images in real-world settings, such as orchards, is a challenging problem.
We formulate fruit counting from images as a multi-class classification problem and solve it by training a Convolutional Neural Network.
Our network outperforms it in three out of four datasets with a maximum of 94% accuracy.
arXiv Detail & Related papers (2022-08-24T14:13:40Z) - Facilitated machine learning for image-based fruit quality assessment in
developing countries [68.8204255655161]
Automated image classification is a common task for supervised machine learning in food science.
We propose an alternative method based on pre-trained vision transformers (ViTs)
It can be easily implemented with limited resources on a standard device.
arXiv Detail & Related papers (2022-07-10T19:52:20Z) - A methodology for detection and localization of fruits in apples
orchards from aerial images [0.0]
This work presents a methodology for automated fruit counting employing aerial-images.
It includes algorithms based on multiple view geometry to perform fruits tracking.
Preliminary assessments show correlations above 0.8 between fruit counting and true yield for apples.
arXiv Detail & Related papers (2021-10-24T01:57:52Z) - ShaRF: Shape-conditioned Radiance Fields from a Single View [54.39347002226309]
We present a method for estimating neural scenes representations of objects given only a single image.
The core of our method is the estimation of a geometric scaffold for the object.
We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
arXiv Detail & Related papers (2021-02-17T16:40:28Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z) - Visual Perception and Modelling in Unstructured Orchard for Apple
Harvesting Robots [6.634537400804884]
This paper develops a framework of visual perception and modelling for robotic harvesting of fruits in the orchard environments.
The framework includes visual perception, scenarios mapping, and fruit modelling.
Experiment results show that visual perception and modelling algorithm can accurately detect and localise the fruits.
arXiv Detail & Related papers (2019-12-29T00:30:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.