Active Learning Methods for Efficient Hybrid Biophysical Variable
Retrieval
- URL: http://arxiv.org/abs/2012.04468v1
- Date: Mon, 7 Dec 2020 08:56:40 GMT
- Title: Active Learning Methods for Efficient Hybrid Biophysical Variable
Retrieval
- Authors: ochem Verrelst, Sara Dethier, Juan Pablo Rivera, Jordi Mu\~noz-Mar\'i,
Gustau Camps-Valls, Jos\'e Moreno
- Abstract summary: Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for operational biophysical variable retrieval schemes.
They face difficulties in coping with large training datasets.
Active learning (AL) methods enable to select the most informative samples in a dataset.
This letter introduces six AL methods for achieving optimized biophysical variable estimation with a manageable training dataset.
- Score: 6.093845877765489
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Kernel-based machine learning regression algorithms (MLRAs) are potentially
powerful methods for being implemented into operational biophysical variable
retrieval schemes. However, they face difficulties in coping with large
training datasets. With the increasing amount of optical remote sensing data
made available for analysis and the possibility of using a large amount of
simulated data from radiative transfer models (RTMs) to train kernel MLRAs,
efficient data reduction techniques will need to be implemented. Active
learning (AL) methods enable to select the most informative samples in a
dataset. This letter introduces six AL methods for achieving optimized
biophysical variable estimation with a manageable training dataset, and their
implementation into a Matlab-based MLRA toolbox for semi-automatic use. The AL
methods were analyzed on their efficiency of improving the estimation accuracy
of leaf area index and chlorophyll content based on PROSAIL simulations. Each
of the implemented methods outperformed random sampling, improving retrieval
accuracy with lower sampling rates. Practically, AL methods open opportunities
to feed advanced MLRAs with RTM-generated training data for development of
operational retrieval models.
Related papers
- Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets [53.367212596352324]
We propose an unsupervised approach leveraging EEG signal physics.
We map EEG channels to fixed positions using field, source-free domain adaptation.
Our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.
arXiv Detail & Related papers (2024-03-07T16:17:33Z) - AI enhanced data assimilation and uncertainty quantification applied to
Geological Carbon Storage [0.0]
We introduce the Surrogate-based hybrid ESMDA (SH-ESMDA), an adaptation of the traditional Ensemble Smoother with Multiple Data Assimilation (ESMDA)
We also introduce Surrogate-based Hybrid RML (SH-RML), a variational data assimilation approach that relies on the randomized maximum likelihood (RML)
Our comparative analyses show that SH-RML offers better uncertainty compared to conventional ESMDA for the case study.
arXiv Detail & Related papers (2024-02-09T00:24:46Z) - Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation [1.6685829157403116]
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models.
We show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately.
arXiv Detail & Related papers (2023-09-22T20:52:50Z) - Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank
Training Data Modifications [16.822770693792823]
We have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed.
This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection.
We introduce a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization.
arXiv Detail & Related papers (2023-06-22T05:00:11Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Active Learning-Based Optimization of Scientific Experimental Design [1.9705094859539976]
Active learning (AL) is a machine learning algorithm that can achieve greater accuracy with fewer labeled training instances.
This article performs a retrospective study on a drug response dataset using the proposed AL scheme.
It shows that scientific experimental design, instead of being manually set, can be optimized by AL.
arXiv Detail & Related papers (2021-12-29T20:02:35Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.