Related papers: Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping

Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping

URL: http://arxiv.org/abs/2507.11279v1
Date: Tue, 15 Jul 2025 12:56:13 GMT
Title: Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping
Authors: Yujie Zhang, Sabine Struckmeyer, Andreas Kolb, Sven Reichardt,
Abstract summary: TomatoMAP is a comprehensive dataset for Solanum lycopersicum.<n>Our dataset contains 64,464 RGB images that capture 12 different plant poses from four camera elevation angles.<n>We provide 3,616 high-resolution image subset with pixel-wise semantic and instance segmentation annotations for fine-grained phenotyping.
Score: 10.807010511060042
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Observer bias and inconsistencies in traditional plant phenotyping methods limit the accuracy and reproducibility of fine-grained plant analysis. To overcome these challenges, we developed TomatoMAP, a comprehensive dataset for Solanum lycopersicum using an Internet of Things (IoT) based imaging system with standardized data acquisition protocols. Our dataset contains 64,464 RGB images that capture 12 different plant poses from four camera elevation angles. Each image includes manually annotated bounding boxes for seven regions of interest (ROIs), including leaves, panicle, batch of flowers, batch of fruits, axillary shoot, shoot and whole plant area, along with 50 fine-grained growth stage classifications based on the BBCH scale. Additionally, we provide 3,616 high-resolution image subset with pixel-wise semantic and instance segmentation annotations for fine-grained phenotyping. We validated our dataset using a cascading model deep learning framework combining MobileNetv3 for classification, YOLOv11 for object detection, and MaskRCNN for segmentation. Through AI vs. Human analysis involving five domain experts, we demonstrate that the models trained on our dataset achieve accuracy and speed comparable to the experts. Cohen's Kappa and inter-rater agreement heatmap confirm the reliability of automated fine-grained phenotyping using our approach.

Related papers

MT-CYP-Net: Multi-Task Network for Pixel-Level Crop Yield Prediction Under Very Few Samples [5.547023223870711]
We propose a novel approach called the Multi-Task Crop Yield Prediction Network (MT-CYP-Net)<n>This framework introduces an effective multi-task feature-sharing strategy, where features extracted from a shared backbone network are simultaneously utilized by both crop yield prediction decoders and crop classification decoders.<n>This design allows MT-CYP-Net to be trained with extremely sparse crop yield point labels and crop type labels, while still generating detailed pixel-level crop yield maps.
arXiv Detail & Related papers (2025-05-17T16:20:44Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
Tree Species Classification using Machine Learning and 3D Tomographic SAR -- a case study in Northern Europe [0.0]
Tree species classification plays an important role in nature conservation, forest inventories, forest management, and the protection of endangered species. In this study, we employed TomoSense, a 3D tomographic dataset, which utilizes a stack of single-look complex (SLC) images.
arXiv Detail & Related papers (2024-11-19T22:25:26Z)
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection. CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z)
Classifying geospatial objects from multiview aerial imagery using semantic meshes [2.116528763953217]
We propose a new method to predict tree species based on aerial images of forests in the U.S. We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthoorthoaic baseline on a challenging cross-site tree classification task.
arXiv Detail & Related papers (2024-05-15T17:56:49Z)
The Canadian Cropland Dataset: A New Land Cover Dataset for Multitemporal Deep Learning Classification in Agriculture [0.8602553195689513]
temporal patch-based dataset of Canadian croplands enriched with labels retrieved from the Canadian Annual Crop Inventory. The dataset contains 78,536 manually verified high-resolution spatial images from 10 crop classes collected over four crop production years. As a benchmark, we provide models and source code that allow a user to predict the crop class using a single image (ResNet, DenseNet, EfficientNet) or a sequence of images (LRCN, 3D-CNN) from the same location.
arXiv Detail & Related papers (2023-05-31T18:40:15Z)
End-to-end deep learning for directly estimating grape yield from ground-based imagery [53.086864957064876]
This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Three model architectures were tested: object detection, CNN regression, and transformer models. The study showed the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale.
arXiv Detail & Related papers (2022-08-04T01:34:46Z)
Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes. Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
Unsupervised Domain Adaptation with Contrastive Learning for OCT Segmentation [49.59567529191423]
We propose a novel semi-supervised learning framework for segmentation of volumetric images from new unlabeled domains. We jointly use supervised and contrastive learning, also introducing a contrastive pairing scheme that leverages similarity between nearby slices in 3D.
arXiv Detail & Related papers (2022-03-07T19:02:26Z)
Classification of Seeds using Domain Randomization on Self-Supervised Learning Frameworks [0.0]
Key bottleneck is the need for an extensive amount of labelled data to train the convolutional neural networks (CNN) The work leverages the concepts of Contrastive Learning and Domain Randomi-zation in order to achieve the same. The use of synthetic images generated from a representational sample crop of real-world images alleviates the need for a large volume of test subjects.
arXiv Detail & Related papers (2021-03-29T12:50:06Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.