Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical Cross-Attention Using Graph-Based and Vision Backbone-Extracted Features
- URL: http://arxiv.org/abs/2501.02649v1
- Date: Sun, 05 Jan 2025 20:30:07 GMT
- Title: Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical Cross-Attention Using Graph-Based and Vision Backbone-Extracted Features
- Authors: Haixu Liu, Penghao Jiang, Zerui Tao, Muyan Wan, Qiuzhuang Sun,
- Abstract summary: We train a model to predict the outcomes of 4,716 plant surveys in Europe.
We build a network based on the backbone of the Swin-Transformer Block for extracting temporal Cubes features.
We then design a hierarchical cross-attention mechanism capable of fusing features from multiple modalities.
- Score: 1.5495593104596397
- License:
- Abstract: Predicting plant species composition in specific spatiotemporal contexts plays an important role in biodiversity management and conservation, as well as in improving species identification tools. Our work utilizes 88,987 plant survey records conducted in specific spatiotemporal contexts across Europe. We also use the corresponding satellite images, time series data, climate time series, and other rasterized environmental data such as land cover, human footprint, bioclimatic, and soil variables as training data to train the model to predict the outcomes of 4,716 plant surveys. We propose a feature construction and result correction method based on the graph structure. Through comparative experiments, we select the best-performing backbone networks for feature extraction in both temporal and image modalities. In this process, we built a backbone network based on the Swin-Transformer Block for extracting temporal Cubes features. We then design a hierarchical cross-attention mechanism capable of robustly fusing features from multiple modalities. During training, we adopt a 10-fold cross-fusion method based on fine-tuning and use a Threshold Top-K method for post-processing. Ablation experiments demonstrate the improvements in model performance brought by our proposed solution pipeline.
Related papers
- Lincoln's Annotated Spatio-Temporal Strawberry Dataset (LAST-Straw) [7.13465721388535]
We present a dataset of 3D point clouds of strawberry plants for two varieties, totalling 84 individual point clouds.
We focus on the end use of such tools - the extraction of biologically relevant phenotypes - to demonstrate a phenotyping pipeline on the dataset.
This comprises of the steps, including; segmentation, skeletonisation and tracking, and we detail how each stage facilitates the extraction of different phenotypes or provision of data insights.
arXiv Detail & Related papers (2024-03-01T14:44:05Z) - Improving Data Efficiency for Plant Cover Prediction with Label
Interpolation and Monte-Carlo Cropping [7.993547048820065]
The plant community composition is an essential indicator of environmental changes and is usually analyzed in ecological field studies.
We introduce an approach to interpolate the sparse labels in the collected vegetation plot time series down to the intermediate dense and unlabeled images.
We also introduce a new method we call Monte-Carlo Cropping to deal with high-resolution images efficiently.
arXiv Detail & Related papers (2023-07-17T15:17:39Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Multi-modal learning for geospatial vegetation forecasting [1.8180482634934092]
We introduce GreenEarthNet, the first dataset specifically designed for high-resolution vegetation forecasting.
We also present Contextformer, a novel deep learning approach for predicting vegetation greenness from Sentinel 2 satellite images.
To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle.
arXiv Detail & Related papers (2023-03-28T17:59:05Z) - Importance attribution in neural networks by means of persistence
landscapes of time series [0.5156484100374058]
We include a gating layer in the network's architecture that is able to identify the most relevant landscape levels for the classification task.
We reconstruct an approximate shape of the time series that gives insight into the classification decision.
arXiv Detail & Related papers (2023-02-06T21:43:39Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - An Effective Leaf Recognition Using Convolutional Neural Networks Based
Features [1.137457877869062]
In this paper, we propose an effective method for the leaf recognition problem.
A leaf goes through some pre-processing to extract its refined color image, vein image, xy-projection histogram, handcrafted shape, texture features, and Fourier descriptors.
These attributes are then transformed into a better representation by neural network-based encoders before a support vector machine (SVM) model is utilized to classify different leaves.
arXiv Detail & Related papers (2021-08-04T02:02:22Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species.
A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.