Graph Embedding with Data Uncertainty
- URL: http://arxiv.org/abs/2009.00505v1
- Date: Tue, 1 Sep 2020 15:08:23 GMT
- Title: Graph Embedding with Data Uncertainty
- Authors: Firas Laakom, Jenni Raitoharju, Nikolaos Passalis, Alexandros
Iosifidis, Moncef Gabbouj
- Abstract summary: spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
- Score: 113.39838145450007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: spectral-based subspace learning is a common data preprocessing step in many
machine learning pipelines. The main aim is to learn a meaningful low
dimensional embedding of the data. However, most subspace learning methods do
not take into consideration possible measurement inaccuracies or artifacts that
can lead to data with high uncertainty. Thus, learning directly from raw data
can be misleading and can negatively impact the accuracy. In this paper, we
propose to model artifacts in training data using probability distributions;
each data point is represented by a Gaussian distribution centered at the
original data point and having a variance modeling its uncertainty. We
reformulate the Graph Embedding framework to make it suitable for learning from
distributions and we study as special cases the Linear Discriminant Analysis
and the Marginal Fisher Analysis techniques. Furthermore, we propose two
schemes for modeling data uncertainty based on pair-wise distances in an
unsupervised and a supervised contexts.
Related papers
- MissDiff: Training Diffusion Models on Tabular Data with Missing Values [29.894691645801597]
This work presents a unified and principled diffusion-based framework for learning from data with missing values.
We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective.
We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases.
arXiv Detail & Related papers (2023-07-02T03:49:47Z) - PCENet: High Dimensional Surrogate Modeling for Learning Uncertainty [15.781915567005251]
We present a novel surrogate model for representation learning and uncertainty quantification.
The proposed model combines a neural network approach for dimensionality reduction of the (potentially high-dimensional) data, with a surrogate model method for learning the data distribution.
Our model enables us to (a) learn a representation of the data, (b) estimate uncertainty in the high-dimensional data system, and (c) match high order moments of the output distribution.
arXiv Detail & Related papers (2022-02-10T14:42:51Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Testing for Typicality with Respect to an Ensemble of Learned
Distributions [5.850572971372637]
One-sample approaches to the goodness-of-fit problem offer significant computational advantages for online testing.
The ability to correctly reject anomalous data in this setting hinges on the accuracy of the model of the base distribution.
Existing methods for the one-sample goodness-of-fit problem do not account for the fact that a model of the base distribution is learned.
We propose training an ensemble of density models, considering data to be anomalous if the data is anomalous with respect to any member of the ensemble.
arXiv Detail & Related papers (2020-11-11T19:47:46Z) - Linear Tensor Projection Revealing Nonlinearity [0.294944680995069]
Dimensionality reduction is an effective method for learning high-dimensional data.
We propose a method that searches for a subspace that maximizes the prediction accuracy while retaining as much of the original data information as possible.
arXiv Detail & Related papers (2020-07-08T06:10:39Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.