Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
- URL: http://arxiv.org/abs/2511.06961v1
- Date: Mon, 10 Nov 2025 11:08:39 GMT
- Title: Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
- Authors: Erel Naor, Ofir Lindenbaum,
- Abstract summary: We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own gating network.<n>Our method achieves consistent gains in low-label classification and regression across diverse datasets, outperforming deep and tree-based supervised baselines.
- Score: 13.591018807414484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks often under-perform on tabular data due to their sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions. These limitations hinder their ability to capture the sharp, high-frequency signals that often define tabular structure, especially under limited labeled samples. While self-supervised learning (SSL) offers promise in such settings, it remains challenging in tabular domains due to the lack of effective data augmentations. We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own stochastic gating network that performs sample-specific feature selection. Together, these structurally different encoders and model-specific gating networks implement model-based augmentation, producing complementary input views tailored to each architecture. The two encoders, trained with a shared decoder and cross-reconstruction loss, learn distinct yet aligned representations that reflect their respective inductive biases. During training, the OSDT encoder (robust to noise and effective at modeling localized, high-frequency structure) guides the neural encoder toward representations more aligned with tabular data. At inference, only the neural encoder is used, preserving flexibility and SSL compatibility. Spectral analysis highlights the distinct inductive biases of each encoder. Our method achieves consistent gains in low-label classification and regression across diverse tabular datasets, outperforming deep and tree-based supervised baselines.
Related papers
- Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z) - Diffusion Autoencoders with Perceivers for Long, Irregular and Multimodal Astronomical Sequences [47.1547360356314]
We introduce the Diffusion Autoencoder with Perceivers (daep)<n>daep tokenizes heterogeneous measurements, compresses them with a Perceiver encoder, and reconstructs them with a Perceiver-IO diffusion decoder.<n>Across diverse spectroscopic and photometric astronomical datasets, daep achieves lower reconstruction errors, produces more discriminative latent spaces, and better preserves fine-scale structure.
arXiv Detail & Related papers (2025-10-23T14:21:01Z) - Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary [1.4594704809280983]
Positional encoding is a high-dimensional representation of time indices on input data.
RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary.
arXiv Detail & Related papers (2024-01-31T23:32:20Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Latent Code-Based Fusion: A Volterra Neural Network Approach [21.25021807184103]
We propose a deep structure encoder using the recently introduced Volterra Neural Networks (VNNs)
We show that the proposed approach demonstrates a much-improved sample complexity over CNN-based auto-encoder with a superb robust classification performance.
arXiv Detail & Related papers (2021-04-10T18:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.