6D Strawberry Pose Estimation: Real-time and Edge AI Solutions Using Purely Synthetic Training Data
- URL: http://arxiv.org/abs/2511.11307v1
- Date: Fri, 14 Nov 2025 13:51:24 GMT
- Title: 6D Strawberry Pose Estimation: Real-time and Edge AI Solutions Using Purely Synthetic Training Data
- Authors: Saptarshi Neil Sinha, Julius Kühn, Mika Silvan Goschke, Michael Weinmann,
- Abstract summary: This paper focuses on 6D pose estimation of strawberries using purely synthetic data generated through a procedural pipeline for rendering.<n>We employ the YOLOX-6D-Pose algorithm, a single-shot approach that leverages the YOLOX backbone, known for its balance between speed and accuracy, and its support for edge inference.
- Score: 5.767786408394506
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Automated and selective harvesting of fruits has become an important area of research, particularly due to challenges such as high costs and a shortage of seasonal labor in advanced economies. This paper focuses on 6D pose estimation of strawberries using purely synthetic data generated through a procedural pipeline for photorealistic rendering. We employ the YOLOX-6D-Pose algorithm, a single-shot approach that leverages the YOLOX backbone, known for its balance between speed and accuracy, and its support for edge inference. To address the lacking availability of training data, we introduce a robust and flexible pipeline for generating synthetic strawberry data from various 3D models via a procedural Blender pipeline, where we focus on enhancing the realism of the synthesized data in comparison to previous work to make it a valuable resource for training pose estimation algorithms. Quantitative evaluations indicate that our models achieve comparable accuracy on both the NVIDIA RTX 3090 and Jetson Orin Nano across several ADD-S metrics, with the RTX 3090 demonstrating superior processing speed. However, the Jetson Orin Nano is particularly suited for resource-constrained environments, making it an excellent choice for deployment in agricultural robotics. Qualitative assessments further confirm the model's performance, demonstrating its capability to accurately infer the poses of ripe and partially ripe strawberries, while facing challenges in detecting unripe specimens. This suggests opportunities for future improvements, especially in enhancing detection capabilities for unripe strawberries (if desired) by exploring variations in color. Furthermore, the methodology presented could be adapted easily for other fruits such as apples, peaches, and plums, thereby expanding its applicability and impact in the field of agricultural automation.
Related papers
- Sparse 3D Perception for Rose Harvesting Robots: A Two-Stage Approach Bridging Simulation and Real-World Applications [0.5407319151576264]
medicinal plants, such as Damask roses, has surged with population growth, yet labor-intensive harvesting remains a bottleneck for scalability.<n>We propose a novel 3D perception pipeline tailored for flower-harvesting robots, focusing on sparse 3D localization of rose centers.<n>Our two-stage algorithm first performs 2D point-based detection on stereo images, followed by depth estimation using a lightweight deep neural network.
arXiv Detail & Related papers (2025-07-28T16:09:34Z) - Raspberry PhenoSet: A Phenology-based Dataset for Automated Growth Detection and Yield Estimation [1.2661567777618703]
We introduce Raspberry PhenoSet, a phenology-based dataset for detecting and segmenting raspberry fruit across seven developmental stages.
This dataset contains 1,853 high-resolution images, the highest quality in the literature, captured under controlled artificial lighting in a vertical farm.
We benchmarked several state-of-the-art deep learning models, including YOLOv8, YOLOv10, RT-DETR, and Mask R-CNN, to provide a comprehensive evaluation of their performance on the dataset.
arXiv Detail & Related papers (2024-11-01T18:34:26Z) - Key Point-based Orientation Estimation of Strawberries for Robotic Fruit
Picking [8.657107511095242]
We introduce a novel key-point-based fruit orientation estimation method allowing for the prediction of 3D orientation from 2D images directly.
Our proposed method achieves state-of-the-art performance with an average error as low as $8circ$, improving predictions by $sim30%$ compared to previous work presented incitewagnerefficient.
arXiv Detail & Related papers (2023-10-17T15:12:11Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z) - Learning 6D Pose Estimation from Synthetic RGBD Images for Robotic
Applications [0.6299766708197883]
The proposed pipeline can efficiently generate large amounts of photo-realistic RGBD images for the object of interest.
We develop a real-time two-stage 6D pose estimation approach by integrating the object detector YOLO-V4-tiny and the 6D pose estimation algorithm PVN3D.
The resulting network shows competitive performance compared to state-of-the-art methods when evaluated on LineMod dataset.
arXiv Detail & Related papers (2022-08-30T14:17:15Z) - Generative models-based data labeling for deep networks regression:
application to seed maturity estimation from UAV multispectral images [3.6868861317674524]
Monitoring seed maturity is an increasing challenge in agriculture due to climate change and more restrictive practices.
Traditional methods are based on limited sampling in the field and analysis in laboratory.
We propose a method for estimating parsley seed maturity using multispectral UAV imagery, with a new approach for automatic data labeling.
arXiv Detail & Related papers (2022-08-09T09:06:51Z) - Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for
Robotic Bin-picking [98.5984733963713]
We propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping.
We establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network.
This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data.
arXiv Detail & Related papers (2022-04-14T15:54:01Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.