Artificial Neural Networks to Impute Rounded Zeros in Compositional Data
- URL: http://arxiv.org/abs/2012.10300v1
- Date: Fri, 18 Dec 2020 15:31:23 GMT
- Title: Artificial Neural Networks to Impute Rounded Zeros in Compositional Data
- Authors: Matthias Templ
- Abstract summary: Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis.
This paper shows a new method for imputing rounded zeros based on artificial neural networks.
It can be shown, that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Methods of deep learning have become increasingly popular in recent years,
but they have not arrived in compositional data analysis. Imputation methods
for compositional data are typically applied on additive, centered or isometric
log-ratio representations of the data. Generally, methods for compositional
data analysis can only be applied to observed positive entries in a data
matrix. Therefore one tries to impute missing values or measurements that were
below a detection limit. In this paper, a new method for imputing rounded zeros
based on artificial neural networks is shown and compared with conventional
methods. We are also interested in the question whether for ANNs, a
representation of the data in log-ratios for imputation purposes, is relevant.
It can be shown, that ANNs are competitive or even performing better when
imputing rounded zeros of data sets with moderate size. They deliver better
results when data sets are big. Also, we can see that log-ratio transformations
within the artificial neural network imputation procedure nevertheless help to
improve the results. This proves that the theory of compositional data analysis
and the fulfillment of all properties of compositional data analysis is still
very important in the age of deep learning.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Research and Implementation of Data Enhancement Techniques for Graph Neural Networks [10.575426305555538]
In practical engineering applications, some data are affected by the conditions under which more data cannot be obtained or the cost of obtaining data is too high.
This paper firstly analyses the key points of the data enhancement technology of graph neural network, and at the same time introduces the composition of graph neural network in depth.
arXiv Detail & Related papers (2024-06-18T14:07:38Z) - Group Distributionally Robust Dataset Distillation with Risk
Minimization [18.07189444450016]
We introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD.
We demonstrate its effective generalization and robustness across subgroups through numerical experiments.
arXiv Detail & Related papers (2024-02-07T09:03:04Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep
Neural Networks [51.143054943431665]
We propose Hypergradient Data Relevance Analysis, or HYDRA, which interprets predictions made by deep neural networks (DNNs) as effects of their training data.
HYDRA assesses the contribution of training data toward test data points throughout the training trajectory.
In addition, we quantitatively demonstrate that HYDRA outperforms influence functions in accurately estimating data contribution and detecting noisy data labels.
arXiv Detail & Related papers (2021-02-04T10:00:13Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.