Stochastic gradient descent with gradient estimator for categorical
features
- URL: http://arxiv.org/abs/2209.03771v2
- Date: Tue, 18 Apr 2023 07:50:55 GMT
- Title: Stochastic gradient descent with gradient estimator for categorical
features
- Authors: Paul Peseux, Maxime Berar, Thierry Paquet, Victor Nicollet
- Abstract summary: Categorical data are present in key areas such as health or supply chain, and this data require specific treatment.
In order to apply recent machine learning models on such data, encoding is needed.
In order to build interpretable models, one-hot encoding is still a very good, but such encoding sparse data.
We show what this estimator minimizes in theory and show its efficiency on different datasets with multiple model architectures.
- Score: 3.597778914286147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Categorical data are present in key areas such as health or supply chain, and
this data require specific treatment. In order to apply recent machine learning
models on such data, encoding is needed. In order to build interpretable
models, one-hot encoding is still a very good solution, but such encoding
creates sparse data. Gradient estimators are not suited for sparse data: the
gradient is mainly considered as zero while it simply does not always exists,
thus a novel gradient estimator is introduced. We show what this estimator
minimizes in theory and show its efficiency on different datasets with multiple
model architectures. This new estimator performs better than common estimators
under similar settings. A real world retail dataset is also released after
anonymization. Overall, the aim of this paper is to thoroughly consider
categorical data and adapt models and optimizers to these key features.
Related papers
- Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data.
We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Scalable Estimation for Structured Additive Distributional Regression [0.0]
We propose a novel backfitting algorithm, which is based on the ideas of gradient descent and can deal virtually with any amount of data on a conventional laptop.
Performance is evaluated using an extensive simulation study and an exceptionally challenging and unique example of lightning count prediction over Austria.
arXiv Detail & Related papers (2023-01-13T14:59:42Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - AdaCat: Adaptive Categorical Discretization for Autoregressive Models [84.85102013917606]
We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat)
AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest.
arXiv Detail & Related papers (2022-08-03T17:53:46Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Graph-LDA: Graph Structure Priors to Improve the Accuracy in Few-Shot
Classification [6.037383467521294]
We introduce a generic model where observed class signals are supposed to be deteriorated with two sources of noise.
We derive an optimal methodology to classify such signals.
This methodology includes a single parameter, making it particularly suitable for cases where available data is scarce.
arXiv Detail & Related papers (2021-08-23T21:55:45Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.