Related papers: Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation

Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation

URL: http://arxiv.org/abs/2501.07183v1
Date: Mon, 13 Jan 2025 10:29:09 GMT
Title: Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation
Authors: Frédérick Fabre Ferber, Dominique Gay, Jean-Christophe Soulié, Jean Diatta, Odalric-Ambrym Maillard,
Abstract summary: This study explores techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R'eunion.<n>Given the spatial nature of the data and the high cost of collection data, we evaluated two approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms.<n>The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data.
Score: 10.945947159224302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R{\'e}union. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.

Related papers

Understanding Data Influence with Differential Approximation [63.817689230826595]
We introduce a new formulation to approximate a sample's influence by accumulating the differences in influence between consecutive learning steps, which we term Diff-In.<n>By employing second-order approximations, we approximate these difference terms with high accuracy while eliminating the need for model convexity required by existing methods.<n>Our theoretical analysis demonstrates that Diff-In achieves significantly lower approximation error compared to existing influence estimators.
arXiv Detail & Related papers (2025-08-20T11:59:32Z)
Exploring Generalized Gait Recognition: Reducing Redundancy and Noise within Indoor and Outdoor Datasets [24.242460774158463]
Generalized gait recognition aims to achieve robust performance across diverse domains.<n>Mixed-dataset training is widely used to enhance generalization.<n>We propose a unified framework that systematically improves cross-domain gait recognition.
arXiv Detail & Related papers (2025-05-21T06:46:09Z)
Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data [0.8246494848934447]
This study presents a hierarchical mining framework for high-dimensional imbalanced data. By constructing a structured graph representation of the dataset and integrating graph neural network embeddings, the proposed method effectively captures global interdependencies among samples. Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach.
arXiv Detail & Related papers (2025-02-06T06:26:41Z)
Interpolation pour l'augmentation de donnees : Application à la gestion des adventices de la canne a sucre a la Reunion [10.945947159224302]
This study explores techniques for the augmentation of geo-referenced data. The aim is to predict the presence of Commelina benghalensis L. in sugarcane plots in La R'eunion.
arXiv Detail & Related papers (2025-01-10T11:02:13Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
Fractal interpolation in the context of prediction accuracy optimization [44.99833362998488]
This paper focuses on the hypothesis of optimizing time series predictions using fractal techniques. Prediction results obtained with the LSTM model showed a significant accuracy improvement compared to the raw datasets.
arXiv Detail & Related papers (2024-03-01T09:49:53Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Interpolation-based Correlation Reduction Network for Semi-Supervised Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN) In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions. We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels. The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
BCDAG: An R package for Bayesian structure and Causal learning of Gaussian DAGs [77.34726150561087]
We introduce the R package for causal discovery and causal effect estimation from observational data. Our implementation scales efficiently with the number of observations and, whenever the DAGs are sufficiently sparse, the number of variables in the dataset. We then illustrate the main functions and algorithms on both real and simulated datasets.
arXiv Detail & Related papers (2022-01-28T09:30:32Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data [4.720638420461489]
We introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression) When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest. By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data.
arXiv Detail & Related papers (2021-08-31T01:58:23Z)
Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering. The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
Hierarchical regularization networks for sparsification based learning on noisy datasets [0.0]
hierarchy follows from approximation spaces identified at successively finer scales. For promoting model generalization at each scale, we also introduce a novel, projection based penalty operator across multiple dimension. Results show the performance of the approach as a data reduction and modeling strategy on both synthetic and real datasets.
arXiv Detail & Related papers (2020-06-09T18:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.