Enhancing Representation Learning on High-Dimensional, Small-Size
Tabular Data: A Divide and Conquer Method with Ensembled VAEs
- URL: http://arxiv.org/abs/2306.15661v1
- Date: Tue, 27 Jun 2023 17:55:31 GMT
- Title: Enhancing Representation Learning on High-Dimensional, Small-Size
Tabular Data: A Divide and Conquer Method with Ensembled VAEs
- Authors: Navindu Leelarathna, Andrei Margeloiu, Mateja Jamnik, Nikola
Simidjievski
- Abstract summary: We present an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach.
We show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.
- Score: 7.923088041693465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational Autoencoders and their many variants have displayed impressive
ability to perform dimensionality reduction, often achieving state-of-the-art
performance. Many current methods however, struggle to learn good
representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an
inherently challenging setting. We address this challenge by using an ensemble
of lightweight VAEs to learn posteriors over subsets of the feature-space,
which get aggregated into a joint posterior in a novel divide-and-conquer
approach. Specifically, we present an alternative factorisation of the joint
posterior that induces a form of implicit data augmentation that yields greater
sample efficiency. Through a series of experiments on eight real-world
datasets, we show that our method learns better latent representations in HDLSS
settings, which leads to higher accuracy in a downstream classification task.
Furthermore, we verify that our approach has a positive effect on
disentanglement and achieves a lower estimated Total Correlation on learnt
representations. Finally, we show that our approach is robust to partial
features at inference, exhibiting little performance degradation even with most
features missing.
Related papers
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Semantic-Preserving Feature Partitioning for Multi-View Ensemble
Learning [11.415864885658435]
We introduce the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory.
The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the multi-view ensemble learning process.
It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable.
arXiv Detail & Related papers (2024-01-11T20:44:45Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size
Datasets [8.926248371832852]
This study presents a novel adversarial Lagrangian integrated contrastive embedding (ALICE) method for small-sized datasets.
The accuracy improvement and training convergence of the proposed pre-trained adversarial transfer are shown.
A novel adversarial integrated contrastive model using various augmentation techniques is investigated.
arXiv Detail & Related papers (2022-10-06T23:59:28Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Minimizing FLOPs to Learn Efficient Sparse Representations [36.24540913526988]
We learn high dimensional and sparse representations that have similar representational capacity as dense embeddings.
Our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets.
arXiv Detail & Related papers (2020-04-12T18:09:02Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.