Related papers: GroMo: Plant Growth Modeling with Multiview Images

GroMo: Plant Growth Modeling with Multiview Images

URL: http://arxiv.org/abs/2503.06608v2
Date: Fri, 06 Jun 2025 04:25:15 GMT
Title: GroMo: Plant Growth Modeling with Multiview Images
Authors: Ruchi Bhatt, Shreya Bansal, Amanpreet Chander, Rupinder Kaur, Malya Singh, Mohan Kankanhalli, Abdulmotaleb El Saddik, Mukesh Kumar Saini,
Abstract summary: We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: plant age prediction and leaf count estimation.<n>The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth.
Score: 3.7287379829068805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, ..., pn) captured over different days (d1, d2, ..., dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.

Related papers

CLIP-Guided Multi-Task Regression for Multi-View Plant Phenotyping [43.24254323363639]
We propose a level-aware vision language framework that jointly predicts plant age and leaf count using a single multi-task model built on CLIP embeddings.<n>Our method aggregates rotational views into angle-invariant representations and conditions visual features on lightweight text priors encoding viewpoint level for stable prediction under incomplete or unordered inputs.<n>On the GroMo25 benchmark, our approach reduces mean age MAE from 7.74 to 3.91 and mean leaf-count MAE from 5.52 to 3.08 compared to the GroMo baseline, corresponding to improvements of 49.5% and 44.2%, respectively.
arXiv Detail & Related papers (2026-03-04T14:01:47Z)
Modeling Time-Lapse Trajectories to Characterize Cranberry Growth [0.14658400971135646]
We introduce a method for modeling crop growth based on fine-tuning vision transformers (ViTs) using a self-supervised approach that avoids tedious image annotations.<n>We use a two-fold pretext task (time regression and class prediction) to learn a latent space for the time-lapse evolution of plant and fruit appearance.<n>The resulting 2D temporal tracks provide an interpretable time-series model of crop growth that can be used to: 1) predict growth over time and 2) distinguish temporal differences of cranberry varieties.
arXiv Detail & Related papers (2025-10-10T01:33:19Z)
ViewSparsifier: Killing Redundancy in Multi-View Plant Phenotyping [8.348234911002821]
Plant phenotyping involves analyzing observable characteristics of plants to better understand their growth, health, and development.<n>In the context of deep learning, this analysis is often approached through single-view classification or regression models.<n>To address this, the Growth Modelling (GroMo) Grand Challenge at ACM Multimedia 2025 provides a multi-view dataset featuring multiple plants.
arXiv Detail & Related papers (2025-09-10T12:53:38Z)
FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data [16.598899500051946]
We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision.<n>This wheat-specific pretraining yields representations that are robust for wheat and transferable to other crops and weeds.
arXiv Detail & Related papers (2025-09-08T17:23:28Z)
Multi-Label Plant Species Prediction with Metadata-Enhanced Multi-Head Vision Transformers [0.0]
We present a multi-head vision transformer approach for multi-label plant species prediction in vegetation plot images.<n>The task involves training models on single-species plant images while testing on multi-species quadrat images, creating a drastic domain shift.<n>Our methodology leverages a pre-trained DINOv2 Vision Transformer Base (ViT-B/14) backbone with multiple classification heads for species, genus, and family prediction.
arXiv Detail & Related papers (2025-08-14T08:56:58Z)
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models [1.5728609542259502]
We develop a framework for characterizing the ripening process of cranberry crops using aerial and ground imaging.<n>This work is the first of its kind and has future impact for cranberries and for other crops including wine grapes, olives, blueberries, and maize.
arXiv Detail & Related papers (2024-12-12T22:03:33Z)
Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z)
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V [103.68138147783614]
We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models. We employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions, and overlay these regions with a set of marks. Using the marked image as input, GPT-4V can answer the questions that require visual grounding.
arXiv Detail & Related papers (2023-10-17T17:51:31Z)
Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum) [0.3441021278275805]
We investigate eight-class growth stage recognition of Amaranthus palmeri in cotton. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8. Highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X.
arXiv Detail & Related papers (2023-07-28T21:14:43Z)
Semantic Image Segmentation with Deep Learning for Vine Leaf Phenotyping [59.0626764544669]
In this study, we use Deep Learning methods to semantically segment grapevine leaves images in order to develop an automated object detection system for leaf phenotyping. Our work contributes to plant lifecycle monitoring through which dynamic traits such as growth and development can be captured and quantified.
arXiv Detail & Related papers (2022-10-24T14:37:09Z)
MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks. Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)
Multi-resolution Outlier Pooling for Sorghum Classification [4.434302808728865]
We introduce the Sorghum-100 dataset, a large dataset of RGB imagery of sorghum captured by a state-of-the-art gantry system. A new global pooling strategy called Dynamic Outlier Pooling outperforms standard global pooling strategies on this task.
arXiv Detail & Related papers (2021-06-10T13:57:33Z)
Temporal Prediction and Evaluation of Brassica Growth in the Field using Conditional Generative Adversarial Networks [1.2926587870771542]
The prediction of plant growth is a major challenge, as it is affected by numerous and highly variable environmental factors. This paper proposes a novel monitoring approach that comprises high- throughput imaging sensor measurements and their automatic analysis. Our approach's core is a novel machine learning-based growth model based on conditional generative adversarial networks.
arXiv Detail & Related papers (2021-05-17T13:00:01Z)
Deep Multi-view Image Fusion for Soybean Yield Estimation in Breeding Applications Deep Multi-view Image Fusion for Soybean Yield Estimation in Breeding Applications [7.450586438835518]
The objective of this study is to develop a machine learning (ML) approach adept at soybean pod counting. We developed a multi-view image-based yield estimation framework utilizing deep learning architectures. Our results demonstrate the promise of ML models in making breeding decisions with significant reduction of time and human effort.
arXiv Detail & Related papers (2020-11-13T20:37:04Z)
Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z)
Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis [110.30849704592592]
We present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. Each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel. We annotate nine types of field anomaly patterns that are most important to farmers.
arXiv Detail & Related papers (2020-01-05T20:19:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.