Reducing Data Complexity using Autoencoders with Class-informed Loss
Functions
- URL: http://arxiv.org/abs/2111.06142v1
- Date: Thu, 11 Nov 2021 10:57:19 GMT
- Title: Reducing Data Complexity using Autoencoders with Class-informed Loss
Functions
- Authors: David Charte and Francisco Charte and Francisco Herrera
- Abstract summary: This paper proposes an autoencoder-based approach to complexity reduction, using class labels in order to inform the loss function.
A thorough experimentation across a collection of 27 datasets shows that class-informed autoencoders perform better than 4 other popular unsupervised feature extraction techniques.
- Score: 14.541733758283355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Available data in machine learning applications is becoming increasingly
complex, due to higher dimensionality and difficult classes. There exists a
wide variety of approaches to measuring complexity of labeled data, according
to class overlap, separability or boundary shapes, as well as group morphology.
Many techniques can transform the data in order to find better features, but
few focus on specifically reducing data complexity. Most data transformation
methods mainly treat the dimensionality aspect, leaving aside the available
information within class labels which can be useful when classes are somehow
complex.
This paper proposes an autoencoder-based approach to complexity reduction,
using class labels in order to inform the loss function about the adequacy of
the generated variables. This leads to three different new feature learners,
Scorer, Skaler and Slicer. They are based on Fisher's discriminant ratio, the
Kullback-Leibler divergence and least-squares support vector machines,
respectively. They can be applied as a preprocessing stage for a binary
classification problem. A thorough experimentation across a collection of 27
datasets and a range of complexity and classification metrics shows that
class-informed autoencoders perform better than 4 other popular unsupervised
feature extraction techniques, especially when the final objective is using the
data for a classification task.
Related papers
- Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Latent Vector Expansion using Autoencoder for Anomaly Detection [1.370633147306388]
We use the features of the autoencoder to train latent vectors from low to high dimensionality.
We propose a latent vector expansion autoencoder model that improves classification performance at imbalanced data.
arXiv Detail & Related papers (2022-01-05T02:28:38Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Tensor feature hallucination for few-shot learning [17.381648488344222]
Few-shot classification addresses the challenge of classifying examples given limited supervision and limited data.
Previous works on synthetic data generation for few-shot classification focus on exploiting complex models.
We investigate how a simple and straightforward synthetic data generation method can be used effectively.
arXiv Detail & Related papers (2021-06-09T18:25:08Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Evaluating Nonlinear Decision Trees for Binary Classification Tasks with
Other Existing Methods [8.870380386952993]
Classification of datasets into two or more distinct classes is an important machine learning task.
Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily interpretable explanation.
We highlight and evaluate a recently proposed nonlinear decision tree approach with a number of commonly used classification methods on a number of datasets.
arXiv Detail & Related papers (2020-08-25T00:00:23Z) - Revisiting Data Complexity Metrics Based on Morphology for Overlap and
Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular
Problems Prospect [9.666866159867444]
This research work focuses on revisiting complexity metrics based on data morphology.
Being based on ball coverage by classes, they are named after Overlap Number of Balls.
arXiv Detail & Related papers (2020-07-15T18:21:13Z) - Unsupervised Person Re-identification via Softened Similarity Learning [122.70472387837542]
Person re-identification (re-ID) is an important topic in computer vision.
This paper studies the unsupervised setting of re-ID, which does not require any labeled information.
Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2020-04-07T17:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.