FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data
- URL: http://arxiv.org/abs/2509.06907v1
- Date: Mon, 08 Sep 2025 17:23:28 GMT
- Title: FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data
- Authors: Bing Han, Chen Zhu, Dong Han, Rui Yu, Songliang Cao, Jianhui Wu, Scott Chapman, Zijian Wang, Bangyou Zheng, Wei Guo, Marie Weiss, Benoit de Solan, Andreas Hund, Lukas Roth, Kirchgessner Norbert, Andrea Visioni, Yufeng Ge, Wenjuan Li, Alexis Comar, Dong Jiang, Dejun Han, Fred Baret, Yanfeng Ding, Hao Lu, Shouyang Liu,
- Abstract summary: We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision.<n>This wheat-specific pretraining yields representations that are robust for wheat and transferable to other crops and weeds.
- Score: 16.598899500051946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-driven field monitoring is central to digital agriculture, yet models built on general-domain pretrained backbones often fail to generalize across tasks, owing to the interaction of fine, variable canopy structures with fluctuating field conditions. We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision on ImAg4Wheat, the largest and most diverse wheat image dataset to date (2.5 million high-resolution images collected over a decade at 30 global sites, spanning >2,000 genotypes and >500 environmental conditions). This wheat-specific pretraining yields representations that are robust for wheat and transferable to other crops and weeds. Across ten in-field vision tasks at canopy and organ levels, FoMo4Wheat models consistently outperform state-of-the-art models pretrained on general-domain dataset. These results demonstrate the value of crop-specific foundation models for reliable in-field perception and chart a path toward a universal crop foundation model with cross-species and cross-task capabilities. FoMo4Wheat models and the ImAg4Wheat dataset are publicly available online: https://github.com/PheniX-Lab/FoMo4Wheat and https://huggingface.co/PheniX-Lab/FoMo4Wheat. The demonstration website is: https://fomo4wheat.phenix-lab.com/.
Related papers
- DepthCropSeg++: Scaling a Crop Segmentation Foundation Model With Depth-Labeled Data [8.868203469534269]
DepthCropSeg++ is a foundation model for crop segmentation, capable of segmenting different crop species under open in-field environment.<n>We build upon a state-of-the-art semantic segmentation architecture ViT-Adapter architecture, enhance it with dynamic upAdapter architecture, and train the model with a two-stage selftraining pipeline.<n>Results demonstrate that DepthCropSeg++ achieves 93.11% moU on a comprehensive testing set, outperforming both supervised baselines and general vision foundation models.
arXiv Detail & Related papers (2026-01-18T11:51:09Z) - Silhouette-based Gait Foundation Model [56.27974816297294]
Building a unified gait foundation model requires addressing two longstanding barriers: Scalability and Generalization.<n>We introduce FoundationGait, the first scalable, self-supervised pretraining framework for gait understanding.
arXiv Detail & Related papers (2025-11-30T01:53:41Z) - GroMo: Plant Growth Modeling with Multiview Images [3.7287379829068805]
We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: plant age prediction and leaf count estimation.<n>The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth.
arXiv Detail & Related papers (2025-03-09T13:23:16Z) - On the Generalizability of Foundation Models for Crop Type Mapping [8.346555291145767]
Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities.<n>We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets.
arXiv Detail & Related papers (2024-09-14T14:43:57Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - Empirical Study of PEFT techniques for Winter Wheat Segmentation [6.110856077714895]
This study seeks to explore the feasibility of cross-area and cross-year out-of-distribution generalization using the State-of-the-Art (SOTA) wheat crop monitoring model.
We focus on adapting the SOTA TSViT model to address winter wheat field segmentation, a critical task for crop monitoring and food security.
Using PEFT techniques, we achieved notable results comparable to those achieved using full fine-tuning methods while training only a mere 0.7% parameters of the whole TSViT architecture.
arXiv Detail & Related papers (2023-10-03T06:42:28Z) - HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using
Harvest Piles and Remote Sensing [50.4506590177605]
HarvestNet is a dataset for mapping the presence of farms in the Ethiopian regions of Tigray and Amhara during 2020-2023.
We introduce a new approach based on the detection of harvest piles characteristic of many smallholder systems.
We conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food insecure regions.
arXiv Detail & Related papers (2023-08-23T11:03:28Z) - End-to-end deep learning for directly estimating grape yield from
ground-based imagery [53.086864957064876]
This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards.
Three model architectures were tested: object detection, CNN regression, and transformer models.
The study showed the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale.
arXiv Detail & Related papers (2022-08-04T01:34:46Z) - EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm [111.17100512647619]
This paper explains the rationality of Vision Transformer by analogy with the proven practical evolutionary algorithm (EA)
We propose a novel pyramid EATFormer backbone that only contains the proposed EA-based transformer (EAT) block.
Massive quantitative and quantitative experiments on image classification, downstream tasks, and explanatory experiments demonstrate the effectiveness and superiority of our approach.
arXiv Detail & Related papers (2022-06-19T04:49:35Z) - Classification of Seeds using Domain Randomization on Self-Supervised
Learning Frameworks [0.0]
Key bottleneck is the need for an extensive amount of labelled data to train the convolutional neural networks (CNN)
The work leverages the concepts of Contrastive Learning and Domain Randomi-zation in order to achieve the same.
The use of synthetic images generated from a representational sample crop of real-world images alleviates the need for a large volume of test subjects.
arXiv Detail & Related papers (2021-03-29T12:50:06Z) - WheatNet: A Lightweight Convolutional Neural Network for High-throughput
Image-based Wheat Head Detection and Counting [12.735055892742647]
We propose a novel deep learning framework to accurately and efficiently count wheat heads to aid in the gathering of real-time data for decision making.
We call our model WheatNet and show that our approach is robust and accurate for a wide range of environmental conditions of the wheat field.
Our proposed method achieves an MAE and RMSE of 3.85 and 5.19 in our wheat head counting task, respectively, while having significantly fewer parameters when compared to other state-of-the-art methods.
arXiv Detail & Related papers (2021-03-17T02:38:58Z) - Agriculture-Vision: A Large Aerial Image Database for Agricultural
Pattern Analysis [110.30849704592592]
We present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
Each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel.
We annotate nine types of field anomaly patterns that are most important to farmers.
arXiv Detail & Related papers (2020-01-05T20:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.