Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field
Crop Yield Prediction
- URL: http://arxiv.org/abs/2401.11844v1
- Date: Mon, 22 Jan 2024 11:01:52 GMT
- Title: Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field
Crop Yield Prediction
- Authors: Francisco Mena, Deepak Pathak, Hiba Najjar, Cristhian Sanchez, Patrick
Helber, Benjamin Bischke, Peter Habelitz, Miro Miranda, Jayanth Siddamsetty,
Marlon Nuske, Marcela Charfuelan, Diego Arenas, Michaela Vollmer, Andreas
Dengel
- Abstract summary: We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany).
Our input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information.
To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module.
The MVGF model is trained at sub-field level with 10 m resolution
- Score: 24.995959334158986
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate crop yield prediction is of utmost importance for informed
decision-making in agriculture, aiding farmers, and industry stakeholders.
However, this task is complex and depends on multiple factors, such as
environmental conditions, soil properties, and management practices. Combining
heterogeneous data views poses a fusion challenge, like identifying the
view-specific contribution to the predictive task. We present a novel
multi-view learning approach to predict crop yield for different crops
(soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our
multi-view input data includes multi-spectral optical images from Sentinel-2
satellites and weather data as dynamic features during the crop growing season,
complemented by static features like soil properties and topographic
information. To effectively fuse the data, we introduce a Multi-view Gated
Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU)
module. The view-encoders handle the heterogeneity of data sources with varying
temporal resolutions by learning a view-specific representation. These
representations are adaptively fused via a weighted sum. The fusion weights are
computed for each sample by the GU using a concatenation of the
view-representations. The MVGF model is trained at sub-field level with 10 m
resolution pixels. Our evaluations show that the MVGF outperforms conventional
models on the same task, achieving the best results by incorporating all the
data sources, unlike the usual fusion results in the literature. For Argentina,
the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction,
while at field level evaluation (comparing field averages), it reaches around
0.80 across different countries. The GU module learned different weights based
on the country and crop-type, aligning with the variable significance of each
data source to the prediction task.
Related papers
- Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation [12.039406240082515]
Fields of The World (FTW) is a novel benchmark dataset for agricultural field instance segmentation.
FTW is an order of magnitude larger than previous datasets with 70,462 samples.
We show that models trained on FTW have better zero-shot and fine-tuning performance in held-out countries.
arXiv Detail & Related papers (2024-09-24T17:20:58Z) - FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition [9.059664504170287]
Federated learning enables decentralized clients to collaboratively learn a shared model while keeping all the training data local.
We introduce a novel approach, FissionVAE, which decomposes the latent space and constructs decoder branches tailored to individual client groups.
To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images of Earth.
arXiv Detail & Related papers (2024-08-30T08:22:30Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Local Manifold Augmentation for Multiview Semantic Consistency [40.28906509638541]
We propose to extract the underlying data variation from datasets and construct a novel augmentation operator, named local manifold augmentation (LMA)
LMA shows the ability to create an infinite number of data views, preserve semantics, and simulate complicated variations in object pose, viewpoint, lighting condition, background etc.
arXiv Detail & Related papers (2022-11-05T02:00:13Z) - MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
Answering over Images and Text [58.655375327681774]
We propose the first Multimodal Retrieval-Augmented Transformer (MuRAG)
MuRAG accesses an external non-parametric multimodal memory to augment language generation.
Our results show that MuRAG achieves state-of-the-art accuracy, outperforming existing models by 10-20% absolute on both datasets.
arXiv Detail & Related papers (2022-10-06T13:58:03Z) - Aggregated Multi-output Gaussian Processes with Knowledge Transfer
Across Domains [39.25639417233822]
This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities.
Experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets.
arXiv Detail & Related papers (2022-06-24T08:07:20Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Meta-Learning for Few-Shot Land Cover Classification [3.8529010979482123]
We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks.
We find that few-shot model adaptation outperforms pre-training with regular gradient descent.
This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences.
arXiv Detail & Related papers (2020-04-28T09:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.