Related papers: IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion

IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion

URL: http://arxiv.org/abs/2505.13633v1
Date: Mon, 19 May 2025 18:13:09 GMT
Title: IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion
Authors: Wentao Song, He Huang, Youqiang Sun, Fang Qu, Jiaqi Zhang, Longhui Fang, Yuwei Hao, Chenyang Peng,
Abstract summary: IPENS is an unsupervised multi-target point cloud extraction method.<n>It achieves a grain-level segmentation accuracy (mIoU) of 63.72% on a rice dataset.<n>It also improves segmentation accuracy to 89.68% (mIoU) on a wheat dataset.
Score: 7.103482669612749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised multi-target point cloud extraction method. The method utilizes radiance field information to lift 2D masks, which are segmented by SAM2 (Segment Anything Model 2), into 3D space for target point cloud extraction. A multi-target collaborative optimization strategy is designed to effectively resolve the single-interaction multi-target segmentation challenge. Experimental validation demonstrates that IPENS achieves a grain-level segmentation accuracy (mIoU) of 63.72% on a rice dataset, with strong phenotypic estimation capabilities: grain volume prediction yields R2 = 0.7697 (RMSE = 0.0025), leaf surface area R2 = 0.84 (RMSE = 18.93), and leaf length and width predictions achieve R2 = 0.97 and 0.87 (RMSE = 1.49 and 0.21). On a wheat dataset,IPENS further improves segmentation accuracy to 89.68% (mIoU), with equally outstanding phenotypic estimation performance: spike volume prediction achieves R2 = 0.9956 (RMSE = 0.0055), leaf surface area R2 = 1.00 (RMSE = 0.67), and leaf length and width predictions reach R2 = 0.99 and 0.92 (RMSE = 0.23 and 0.15). This method provides a non-invasive, high-quality phenotyping extraction solution for rice and wheat. Without requiring annotated data, it rapidly extracts grain-level point clouds within 3 minutes through simple single-round interactions on images for multiple targets, demonstrating significant potential to accelerate intelligent breeding efficiency.

Related papers

An Improved YOLOv8 Approach for Small Target Detection of Rice Spikelet Flowering in Field Environments [1.0288898584996287]
This study proposes a rice spikelet flowering recognition method based on an improved YOLOv8 object detection model.<n>BiFPN replaces the original PANet structure to enhance feature fusion and improve multi-scale feature utilization.<n>Given the lack of publicly available datasets for rice spikelet flowering in field conditions, a high-resolution RGB camera and data augmentation techniques are used.
arXiv Detail & Related papers (2025-07-28T04:01:29Z)
Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting [1.4100451538155885]
We present Wheat3DGS, a novel approach that leverages 3DGS and the Segment Anything Model (SAM) for precise 3D instance segmentation and morphological measurement of hundreds of wheat heads automatically.<n>We validate the accuracy of wheat breeding head extraction against high-resolution laser scan data, obtaining per-instance mean absolute percentage errors of 15.1%, 18.3%, and 40.2% for length, width, and volume.
arXiv Detail & Related papers (2025-04-09T15:31:42Z)
Deep Learning-Based Direct Leaf Area Estimation using Two RGBD Datasets for Model Development [6.663132872468536]
Estimation of a single leaf area can be a measure of crop growth and a phenotypic trait to breed new varieties.<n>This work investigates deep learning-based leaf area estimation, for RGBD images taken using a mobile camera setup in real-world scenarios.
arXiv Detail & Related papers (2025-03-13T07:39:09Z)
PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone [4.441945709704536]
PanicleNeRF is a novel method that enables high-precision and low-cost reconstruction of rice panicle models in the field using smartphone. Result: PanicleNeRF effectively addressed the 2D image segmentation task, achieving a mean F1 Score of 86.9% and a mean Intersection over Union (IoU) of 79.8%. This method provides a low-cost solution for high-cost phenotyping of rice panicles, accelerating the efficiency of rice breeding.
arXiv Detail & Related papers (2024-08-04T15:01:16Z)
SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. With the development of Transformer, the scale of SIRST models is constantly increasing. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z)
KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection [48.66703222700795]
We resort to a novel kernel strategy to identify the most informative point clouds to acquire labels. To accommodate both one-stage (i.e., SECOND) and two-stage detectors, we incorporate the classification entropy tangent and well trade-off between detection performance and the total number of bounding boxes selected for annotation. Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art method.
arXiv Detail & Related papers (2023-07-16T04:27:03Z)
Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization [81.29406957201458]
Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects. We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection. We propose to model the rotated objects as Gaussian distributions. We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation.
arXiv Detail & Related papers (2022-09-22T07:50:48Z)
Generative models-based data labeling for deep networks regression: application to seed maturity estimation from UAV multispectral images [3.6868861317674524]
Monitoring seed maturity is an increasing challenge in agriculture due to climate change and more restrictive practices. Traditional methods are based on limited sampling in the field and analysis in laboratory. We propose a method for estimating parsley seed maturity using multispectral UAV imagery, with a new approach for automatic data labeling.
arXiv Detail & Related papers (2022-08-09T09:06:51Z)
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z)
Robust Representation via Dynamic Feature Aggregation [44.927408735490005]
Deep convolutional neural network (CNN) based models are vulnerable to adversarial attacks. We propose a method, denoted as Dynamic Feature Aggregation, to compress the embedding space with a novel regularization. An averaging accuracy of 56.91% is achieved by our method on CIFAR-10 against various attack methods.
arXiv Detail & Related papers (2022-05-16T06:22:15Z)
A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows from UAV Imagery [56.10033255997329]
We propose a novel deep learning method based on a Convolutional Neural Network (CNN) It simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations. The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.
arXiv Detail & Related papers (2020-12-31T18:51:17Z)
Fusing Optical and SAR time series for LAI gap filling with multioutput Gaussian processes [6.0122901245834015]
Persistent clouds over agricultural fields can mask key stages of crop growth, leading to unreliable yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery which can potentially overcome this limitation. We propose the use of Multi-Output Gaussian Process (MOGP) regression, a machine learning technique that learns automatically the statistical relationships among multisensor time series.
arXiv Detail & Related papers (2020-12-05T10:36:45Z)
ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [43.26668942258135]
Medical image segmentation is one of the major challenges addressed by machine learning methods. We propose ROAM, a RandOm lAyer Mixup, which generates more data points that have never seen before. ROAM achieves state-of-the-art (SOTA) results in fully supervised (89.5%) and semi-supervised (87.0%) settings with a relative improvement of up to 2.40% and 16.50%, respectively for the whole-brain segmentation.
arXiv Detail & Related papers (2020-03-20T18:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.