Development and evaluation of automated localisation and reconstruction
of all fruits on tomato plants in a greenhouse based on multi-view perception
and 3D multi-object tracking
- URL: http://arxiv.org/abs/2211.02760v3
- Date: Tue, 28 Nov 2023 11:44:16 GMT
- Title: Development and evaluation of automated localisation and reconstruction
of all fruits on tomato plants in a greenhouse based on multi-view perception
and 3D multi-object tracking
- Authors: David Rapado Rincon, Eldert J. van Henten, Gert Kootstra
- Abstract summary: This paper presents a novel approach for building generic representations in occluded agro-food environments.
It is based on a detection algorithm that generates partial point clouds for each detected object, followed by a 3D multi-object tracking algorithm.
The accuracy of the representation was evaluated in a real-world environment, where successful representation and localisation of tomatoes in tomato plants were achieved.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to accurately represent and localise relevant objects is
essential for robots to carry out tasks effectively. Traditional approaches,
where robots simply capture an image, process that image to take an action, and
then forget the information, have proven to struggle in the presence of
occlusions. Methods using multi-view perception, which have the potential to
address some of these problems, require a world model that guides the
collection, integration and extraction of information from multiple viewpoints.
Furthermore, constructing a generic representation that can be applied in
various environments and tasks is a difficult challenge. In this paper, a novel
approach for building generic representations in occluded agro-food
environments using multi-view perception and 3D multi-object tracking is
introduced. The method is based on a detection algorithm that generates partial
point clouds for each detected object, followed by a 3D multi-object tracking
algorithm that updates the representation over time. The accuracy of the
representation was evaluated in a real-world environment, where successful
representation and localisation of tomatoes in tomato plants were achieved,
despite high levels of occlusion, with the total count of tomatoes estimated
with a maximum error of 5.08% and the tomatoes tracked with an accuracy up to
71.47%. Novel tracking metrics were introduced, demonstrating that valuable
insight into the errors in localising and representing the fruits can be
provided by their use. This approach presents a novel solution for building
representations in occluded agro-food environments, demonstrating potential to
enable robots to perform tasks effectively in these challenging environments.
Related papers
- Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints.
No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose.
Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Panoptic Mapping with Fruit Completion and Pose Estimation for
Horticultural Robots [33.21287030243106]
Monitoring plants and fruits at high resolution play a key role in the future of agriculture.
Accurate 3D information can pave the way to a diverse number of robotic applications in agriculture ranging from autonomous harvesting to precise yield estimation.
We address the problem of jointly estimating complete 3D shapes of fruit and their pose in a 3D multi-resolution map built by a mobile robot.
arXiv Detail & Related papers (2023-03-15T20:41:24Z) - RREx-BoT: Remote Referring Expressions with a Bag of Tricks [19.036557405184656]
We show how a vision-language scoring model can be used to locate objects in unobserved environments.
We demonstrate our model on a real-world TurtleBot platform, highlighting the simplicity and usefulness of the approach.
Our analysis outlines a "bag of tricks" essential for accomplishing this task, from utilizing 3d coordinates and context, to generalizing vision-language models to large 3d search spaces.
arXiv Detail & Related papers (2023-01-30T02:19:19Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - Uncertainty Guided Policy for Active Robotic 3D Reconstruction using
Neural Radiance Fields [82.21033337949757]
This paper introduces a ray-based volumetric uncertainty estimator, which computes the entropy of the weight distribution of the color samples along each ray of the object's implicit neural representation.
We show that it is possible to infer the uncertainty of the underlying 3D geometry given a novel view with the proposed estimator.
We present a next-best-view selection policy guided by the ray-based volumetric uncertainty in neural radiance fields-based representations.
arXiv Detail & Related papers (2022-09-17T21:28:57Z) - Lifelong Ensemble Learning based on Multiple Representations for
Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem.
To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly.
We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.