Related papers: Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production

Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production

URL: http://arxiv.org/abs/2511.11880v1
Date: Fri, 14 Nov 2025 21:18:01 GMT
Title: Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production
Authors: David Montero, Miguel D. Mahecha, Francesco Martinuzzi, César Aybar, Anne Klosterhalfen, Alexander Knohl, Jesús Anaya, Clemens Mosig, Sebastian Wieneke,
Abstract summary: Remote sensing offers an alternative to single-sensor spectral indices and statistical models.<n>Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes.<n>Here, we explore the performance of two representative models for predicting Gross Primary Production.
Score: 32.81344785160551
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monitoring the spatiotemporal dynamics of forest CO$_2$ uptake (Gross Primary Production, GPP), remains a central challenge in terrestrial ecosystem research. While Eddy Covariance (EC) towers provide high-frequency estimates, their limited spatial coverage constrains large-scale assessments. Remote sensing offers a scalable alternative, yet most approaches rely on single-sensor spectral indices and statistical models that are often unable to capture the complex temporal dynamics of GPP. Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes, but comparative evaluations of state-of-the-art DL models for multimodal GPP prediction remain scarce. Here, we explore the performance of two representative models for predicting GPP: 1) GPT-2, a transformer architecture, and 2) Long Short-Term Memory (LSTM), a recurrent neural network, using multivariate inputs. Overall, both achieve similar accuracy. But, while LSTM performs better overall, GPT-2 excels during extreme events. Analysis of temporal context length further reveals that LSTM attains similar accuracy using substantially shorter input windows than GPT-2, highlighting an accuracy-efficiency trade-off between the two architectures. Feature importance analysis reveals radiation as the dominant predictor, followed by Sentinel-2, MODIS land surface temperature, and Sentinel-1 contributions. Our results demonstrate how model architecture, context length, and multimodal inputs jointly determine performance in GPP prediction, guiding future developments of DL frameworks for monitoring terrestrial carbon dynamics.

Related papers

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection [94.12444452690329]
This paper presents MambaTAD, a new state-space TAD model that introduces long-range modeling and global feature detection capabilities.<n>MambaTAD achieves superior TAD performance consistently across multiple public benchmarks.
arXiv Detail & Related papers (2025-11-22T06:04:29Z)
A Hybrid PCA-PR-Seq2Seq-Adam-LSTM Framework for Time-Series Power Outage Prediction [5.657115189763182]
This paper introduces a hybrid deep learning framework, termed PCA-PR-Seq2Seq-Adam-LSTM.<n>It integrates Principal Component Analysis (PCA), Poisson Regression (PR), a Sequence-to-Sequence (Seq2Seq) architecture, and an Adam-optimized LSTM.<n>Results indicate that the proposed approach significantly improves forecasting accuracy and robustness compared to existing methods.
arXiv Detail & Related papers (2025-09-20T17:13:25Z)
DIFFUMA: High-Fidelity Spatio-Temporal Video Prediction via Dual-Path Mamba and Diffusion Enhancement [5.333662480077316]
We release the Chip Dicing Lane dataset (CHDL), the first public temporal image dataset dedicated to the semiconductor wafer dicing process.<n>We propose DIFFUMA, an innovative dual-path prediction architecture specifically designed for such fine-grained dynamics.<n>Experiments demonstrate that DIFFUMA significantly outperforms existing methods, reducing the Mean Squared Error (MSE) by 39% and improving the Similarity (SSIM) from 0.926 to a near-perfect 0.988.
arXiv Detail & Related papers (2025-07-09T10:51:54Z)
Recurrent Neural Networks for Modelling Gross Primary Production [34.819587029115205]
Gross Primary Production is the largest atmosphere-to-land CO$$ flux, especially significant for forests. Deep learning offers novel perspectives, and the potential of neural network architectures for estimating daily MME remains underexplored. This study presents a comparative analysis of three architectures: Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs), and Long-Short Term Memory (LSTMs)
arXiv Detail & Related papers (2024-04-19T09:46:45Z)
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction [53.637837706712794]
Trajectory prediction plays an important role in various applications, including autonomous driving, robotics, and scene understanding. Existing approaches mainly focus on developing compact neural networks to increase prediction precision on public datasets, typically employing a standardized input duration. We introduce a general and effective framework, the FlexiLength Network (FLN), to enhance the robustness of existing trajectory prediction against varying observation periods.
arXiv Detail & Related papers (2024-03-31T17:18:57Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
Upscaling Global Hourly GPP with Temporal Fusion Transformer (TFT) [0.0]
Gross Primary Productivity is crucial for evaluating climate change initiatives. Estimates are currently only available from sparsely distributed eddy covariance tower sites. This research explored a novel upscaling solution using Temporal Fusion Transformer (TFT)
arXiv Detail & Related papers (2023-06-23T23:29:05Z)
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT) Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z)
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Greenhouse Gas Emission Prediction on Road Network using Deep Sequence Learning [4.814071726181215]
We develop a deep learning framework to predict link-level GHG emission rate (ER) based on the most representative predictors, such as speed, density, and the GHG ER of previous time steps. The downtown Toronto road network is used as the case study and highly detailed data are synthesized using a calibrated traffic microsimulation and MOVES.
arXiv Detail & Related papers (2020-04-16T14:25:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.