Indirectly Parameterized Concrete Autoencoders
- URL: http://arxiv.org/abs/2403.00563v2
- Date: Fri, 16 Aug 2024 09:53:46 GMT
- Title: Indirectly Parameterized Concrete Autoencoders
- Authors: Alfred Nilsson, Klas Wijk, Sai bharath chandra Gutha, Erik Englesson, Alexandra Hotti, Carlo Saccardi, Oskar Kviman, Jens Lagergren, Ricardo Vinuesa, Hossein Azizpour,
- Abstract summary: Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications.
Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications.
- Score: 40.35109085799772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
Related papers
- Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Federated Learning for Sparse Principal Component Analysis [0.0]
Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized.
We apply this framework to Sparse Principal Component Analysis (SPCA) in this work.
SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability.
arXiv Detail & Related papers (2023-11-15T03:55:28Z) - Align With Purpose: Optimize Desired Properties in CTC Models with a
General Plug-and-Play Framework [8.228892600588765]
Connectionist Temporal Classification ( CTC) is a widely used criterion for training sequence-to-sequence (seq2seq) models.
We propose $textitAlign With Purpose, a $textbfgeneral Plug-and-Play framework for enhancing a desired property in models trained with the CTC criterion.
We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset.
arXiv Detail & Related papers (2023-07-04T13:34:47Z) - A Hybrid Framework for Sequential Data Prediction with End-to-End
Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues.
We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression.
We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z) - DANCE: DAta-Network Co-optimization for Efficient Segmentation Model
Training and Inference [85.02494022662505]
DANCE is an automated simultaneous data-network co-optimization for efficient segmentation model training and inference.
It integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images' spatial complexity.
Experiments and ablating studies demonstrate that DANCE can achieve "all-win" towards efficient segmentation.
arXiv Detail & Related papers (2021-07-16T04:58:58Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Direct Evolutionary Optimization of Variational Autoencoders With Binary
Latents [0.0]
We show that it is possible to train Variational Autoencoders (VAEs) with discrete latents without sampling-based approximation and re parameterization.
In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets.
arXiv Detail & Related papers (2020-11-27T12:42:12Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.