Related papers: GMT: Guided Mask Transformer for Leaf Instance Segmentation

GMT: Guided Mask Transformer for Leaf Instance Segmentation

URL: http://arxiv.org/abs/2406.17109v1
Date: Mon, 24 Jun 2024 19:52:27 GMT
Title: GMT: Guided Mask Transformer for Leaf Instance Segmentation
Authors: Feng Chen, Sotirios A. Tsaftaris, Mario Valerio Giuffrida,
Abstract summary: Leaf instance segmentation is a challenging task, aiming to separate and delineate each leaf in an image of a plant. We propose Guided Mask (GMT), which contains three key components, namely Guided Positional Transformer (GPE), Guided Embedding Fusion Module (GEFM) and Guided Dynamic Positional Queries (GDPQ) The proposed GMT consistently outperforms State-of-the-Art models on three public plant datasets.
Score: 14.458970589296554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Leaf instance segmentation is a challenging multi-instance segmentation task, aiming to separate and delineate each leaf in an image of a plant. The delineation of each leaf is a necessary prerequisite task for several biology-related applications such as the fine-grained monitoring of plant growth, and crop yield estimation. The task is challenging because self-similarity of instances is high (similar shape and colour) and instances vary greatly in size under heavy occulusion. We believe that the key to overcoming the aforementioned challenges lies in the specific spatial patterns of leaf distribution. For example, leaves typically grow around the plant's center, with smaller leaves clustering and overlapped near this central point. In this paper, we propose a novel approach named Guided Mask Transformer (GMT), which contains three key components, namely Guided Positional Encoding (GPE), Guided Embedding Fusion Module (GEFM) and Guided Dynamic Positional Queries (GDPQ), to extend the meta-architecture of Mask2Former and incorporate with a set of harmonic guide functions. These guide functions are tailored to the pixel positions of instances and trained to separate distinct instances in an embedding space. The proposed GMT consistently outperforms State-of-the-Art models on three public plant datasets.

Related papers

Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources. We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs. Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z)
Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains. We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions. The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z)
Unsupervised Pre-Training for 3D Leaf Instance Segmentation [34.122575664767915]
This paper addresses the problem of reducing the labeling effort required to perform leaf instance segmentation on 3D point clouds. We propose a novel self-supervised task-specific pre-training approach to initialize the backbone of a network for leaf instance segmentation. We also introduce a novel automatic postprocessing that considers the difficulty of correctly segmenting the points close to the stem.
arXiv Detail & Related papers (2024-01-16T08:11:08Z)
HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation [113.6560373226501]
This work studies semantic segmentation under the domain generalization setting. We propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers.
arXiv Detail & Related papers (2023-05-22T13:33:41Z)
Hierarchical Approach for Joint Semantic, Plant Instance, and Leaf Instance Segmentation in the Agricultural Domain [29.647846446064992]
Plant phenotyping is a central task in agriculture, as it describes plants' growth stage, development, and other relevant quantities. In this paper, we address the problem of joint semantic, plant instance, and leaf instance segmentation of crop fields from RGB data. We propose a single convolutional neural network that addresses the three tasks simultaneously, exploiting their underlying hierarchical structure.
arXiv Detail & Related papers (2022-10-14T15:01:08Z)
Statistical shape representations for temporal registration of plant components in 3D [5.349852254138086]
We demonstrate how using shape features improves temporal organ matching. This is essential for robotic crop monitoring, which enables whole-of-lifecycle phenotyping.
arXiv Detail & Related papers (2022-09-23T11:11:10Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation. It simultaneously learns global semantic information and local spatial-detailed features. Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
GrowliFlower: An image time series dataset for GROWth analysis of cauLIFLOWER [2.8247971782279615]
This article presents GrowliFlower, an image-based UAV time series dataset of two monitored cauliflower fields of size 0.39 and 0.60 ha acquired in 2020 and 2021. The dataset contains RGB and multispectral orthophotos from which about 14,000 individual plant coordinates are derived and provided. The dataset contains collected phenotypic traits of 740 plants, including the developmental stage as well as plant and cauliflower size.
arXiv Detail & Related papers (2022-04-01T08:56:59Z)
Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision [66.56535902642085]
This paper tackles the problem of fine-grained region detection in deformed clothes using only a depth image. We define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-net based network to segment and label these parts. We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data.
arXiv Detail & Related papers (2021-10-06T16:31:20Z)
Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB) SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map. SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z)
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
Unsupervised Domain Adaptation For Plant Organ Counting [12.424350934766704]
Counting plant organs for image-based plant phenotyping falls within this category. In this paper, we propose a domain-adrial learning approach for domain adaptation of density map estimation.
arXiv Detail & Related papers (2020-09-02T13:57:09Z)
Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.