Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications
- URL: http://arxiv.org/abs/2403.14297v2
- Date: Mon, 13 May 2024 09:29:28 GMT
- Title: Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications
- Authors: Francisco Mena, Diego Arenas, Marcela Charfuelan, Marlon Nuske, Andreas Dengel,
- Abstract summary: We assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks.
We find that some methods are naturally more robust to missing data.
The optical view is the most critical view when it is missing individually.
- Score: 4.388282062290401
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks. We compare the predictive quality of different methods and find that some are naturally more robust to missing data. The Ensemble strategy, in particular, achieves a prediction robustness up to 100%. We evidence that missing scenarios are significantly more challenging in regression than classification tasks. Finally, we find that the optical view is the most critical view when it is missing individually.
Related papers
- A Study on Bias Detection and Classification in Natural Language Processing [2.908482270923597]
The aim of our work is to determine how to better combine publicly-available datasets to train models in the task of hate speech detection and classification.
We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance.
arXiv Detail & Related papers (2024-08-14T11:49:24Z) - Class Imbalance in Object Detection: An Experimental Diagnosis and Study
of Mitigation Strategies [0.5439020425818999]
This study introduces a benchmarking framework utilizing the YOLOv5 single-stage detector to address the problem of foreground-foreground class imbalance.
We scrutinized three established techniques: sampling, loss weighing, and data augmentation.
Our comparative analysis reveals that sampling and loss reweighing methods, while shown to be beneficial in two-stage detector settings, do not translate as effectively in improving YOLOv5's performance.
arXiv Detail & Related papers (2024-03-11T19:06:04Z) - Spatial-temporal Forecasting for Regions without Observations [13.805203053973772]
We study spatial-temporal forecasting for a region of interest without any historical observations.
We propose a model named STSM for the task.
Our key insight is to learn from the locations that resemble those in the region of interest.
arXiv Detail & Related papers (2024-01-19T06:26:05Z) - Mobile Internet Quality Estimation using Self-Tuning Kernel Regression [7.6449549886709764]
We look into estimating mobile (cellular) internet quality at the scale of a state in the United States.
Most of the samples are concentrated in limited areas, while very few are available in the rest.
We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance.
arXiv Detail & Related papers (2023-11-04T21:09:46Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Time Series Anomaly Detection via Reinforcement Learning-Based Model
Selection [3.1692938090731584]
Time series anomaly detection is of critical importance for the reliable and efficient operation of real-world systems.
In this work, we assume that a pool of anomaly detection models is accessible and propose to utilize reinforcement learning to dynamically select a candidate model.
It is demonstrated that the proposed strategy can outperforms all baseline models in terms of overall performance.
arXiv Detail & Related papers (2022-05-19T22:10:35Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - OR-Net: Pointwise Relational Inference for Data Completion under Partial
Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data.
We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.