Related papers: PlantSAM: An Object Detection-Driven Segmentation Pipeline for Herbarium Specimens

PlantSAM: An Object Detection-Driven Segmentation Pipeline for Herbarium Specimens

URL: http://arxiv.org/abs/2507.16506v1
Date: Tue, 22 Jul 2025 12:02:39 GMT
Title: PlantSAM: An Object Detection-Driven Segmentation Pipeline for Herbarium Specimens
Authors: Youcef Sklab, Florian Castanet, Hanane Ariouat, Souhila Arib, Jean-Daniel Zucker, Eric Chenin, Edi Prifti,
Abstract summary: We introduce PlantSAM, an automated segmentation pipeline that integrates YOLOv10 for plant region detection and the Segment Anything Model (SAM2) for segmentation.<n>YOLOv10 generates bounding box prompts to guide SAM2, enhancing segmentation accuracy.<n>PlantSAM achieved state-of-the-art segmentation performance, with an IoU of 0.94 and a Dice coefficient of 0.97.
Score: 0.5339846068056558
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep learning-based classification of herbarium images is hampered by background heterogeneity, which introduces noise and artifacts that can potentially mislead models and reduce classification accuracy. Addressing these background-related challenges is critical to improving model performance. We introduce PlantSAM, an automated segmentation pipeline that integrates YOLOv10 for plant region detection and the Segment Anything Model (SAM2) for segmentation. YOLOv10 generates bounding box prompts to guide SAM2, enhancing segmentation accuracy. Both models were fine-tuned on herbarium images and evaluated using Intersection over Union (IoU) and Dice coefficient metrics. PlantSAM achieved state-of-the-art segmentation performance, with an IoU of 0.94 and a Dice coefficient of 0.97. Incorporating segmented images into classification models led to consistent performance improvements across five tested botanical traits, with accuracy gains of up to 4.36% and F1-score improvements of 4.15%. Our findings highlight the importance of background removal in herbarium image analysis, as it significantly enhances classification accuracy by allowing models to focus more effectively on the foreground plant structures.

Related papers

Analysis of Plant Nutrient Deficiencies Using Multi-Spectral Imaging and Optimized Segmentation Model [1.4172975813702]
This study presents a deep learning framework for leaf anomaly segmentation using multispectral imaging and an enhanced YOLOv5 model.<n>The model is tailored for processing nine-channel multispectral input and uses self-attention mechanisms to better capture subtle, spatially-distributed symptoms.
arXiv Detail & Related papers (2025-07-18T15:25:36Z)
ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection [7.457821910654639]
This work studies how to avoid training by integrating large pre-trained models like SAM-2 and Owl-v2 with temporal information into a modular pipeline.<n>Our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628.
arXiv Detail & Related papers (2025-04-10T06:24:54Z)
A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images [0.0]
We propose a solution that enhances data quality, model performance, and usability by creating a lightweight, cell segmentation and classification model.<n>We update data labels through cross-relabeling to refine annotations of PanNuke and MoNuSAC, producing a unified dataset with seven distinct cell types.<n>Third, to address foundation models' computational demands, we distill knowledge to reduce model size and complexity while maintaining comparable performance.
arXiv Detail & Related papers (2025-02-26T15:19:52Z)
Dataset Distillation for Histopathology Image Classification [46.04496989951066]
We introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD) We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks.
arXiv Detail & Related papers (2024-08-19T05:53:38Z)
Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials. This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Sub-token ViT Embedding via Stochastic Resonance Transformers [51.12001699637727]
Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch. We propose a training-free method inspired by "stochastic resonance" The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization.
arXiv Detail & Related papers (2023-10-06T01:53:27Z)
Aphid Cluster Recognition and Detection in the Wild Using Deep Learning Models [17.65292847038642]
Aphid infestation poses a significant threat to crop production, rural communities, and global food security. This paper primarily focuses on using deep learning models for detecting aphid clusters. We propose a novel approach for estimating infection levels by detecting aphid clusters.
arXiv Detail & Related papers (2023-08-10T23:53:07Z)
A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy [3.5752677591512487]
This work uses cardiac substructure segmentation as an example task to establish a quality assurance framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients was collected. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. A regression model was trained to predict the per-patient segmentation accuracy, measured by Dice similarity coefficient (DSC)
arXiv Detail & Related papers (2023-05-19T14:51:05Z)
Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network. We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module. Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z)
Comparative analysis of deep learning approaches for AgNOR-stained cytology samples interpretation [52.77024349608834]
This paper provides a way to analyze argyrophilic nucleolar organizer regions (AgNOR) stained slide using deep learning approaches. Our results show that the semantic segmentation using U-Net with ResNet-18 or ResNet-34 as the backbone have similar results. The best model shows an IoU for nucleus, cluster, and satellites of 0.83, 0.92, and 0.99 respectively.
arXiv Detail & Related papers (2022-10-19T15:15:32Z)
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes [58.633364000258645]
We call this dataset RIVAL10 consisting of roughly $26k$ instances over $10$ classes. We evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes. In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training)
arXiv Detail & Related papers (2022-01-26T06:31:28Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.