An Empirical Study on Feature Discretization
- URL: http://arxiv.org/abs/2004.12602v1
- Date: Mon, 27 Apr 2020 06:50:17 GMT
- Title: An Empirical Study on Feature Discretization
- Authors: Qiang Liu and Zhaocheng Liu and Haoli Zhang
- Abstract summary: We propose a novel discretization method called Local Linear.
Experiments on two numeric datasets show that, LLE can outperform conventional discretization method with much fewer model parameters.
- Score: 8.900900745767869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When dealing with continuous numeric features, we usually adopt feature
discretization. In this work, to find the best way to conduct feature
discretization, we present some theoretical analysis, in which we focus on
analyzing correctness and robustness of feature discretization. Then, we
propose a novel discretization method called Local Linear Encoding (LLE).
Experiments on two numeric datasets show that, LLE can outperform conventional
discretization method with much fewer model parameters.
Related papers
- Improving the Euclidean Diffusion Generation of Manifold Data by Mitigating Score Function Singularity [7.062379942776126]
We investigate direct sampling of Euclidean diffusion models for general manifold-constrained data.<n>We reveal the multiscale singularity of the score function in the embedded space of manifold, which hinders the accuracy of diffusion-generated samples.<n>We propose two novel methods to mitigate the singularity and improve the sampling accuracy.
arXiv Detail & Related papers (2025-05-15T03:12:27Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Convergence Analysis of Discrete Diffusion Model: Exact Implementation
through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points.
Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic
Analysis For DDIM-Type Samplers [90.45898746733397]
We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.
We show that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current gradient.
arXiv Detail & Related papers (2023-03-06T18:59:19Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Discrete-Continuous Smoothing and Mapping [8.90077503980675]
We describe a general approach to smoothing and mapping with a class of discrete-continuous factor graphs commonly encountered in robotics applications.
We provide a library, DC-SAM, extending existing tools for optimization problems defined in terms of factor graphs to the setting of discrete-continuous models.
arXiv Detail & Related papers (2022-04-25T19:31:44Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - Binary Independent Component Analysis via Non-stationarity [7.283533791778359]
We consider independent component analysis of binary data.
We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model.
In stark contrast to the continuous-valued case, we prove non-identifiability of the model with few observed variables.
arXiv Detail & Related papers (2021-11-30T14:23:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.