Related papers: Kernel Density Estimation by Genetic Algorithm

Kernel Density Estimation by Genetic Algorithm

URL: http://arxiv.org/abs/2203.01535v1
Date: Thu, 3 Mar 2022 06:16:18 GMT
Title: Kernel Density Estimation by Genetic Algorithm
Authors: Kiheiji Nishida
Abstract summary: Genetic algorithm generates multiple subsamples of a given size with replacement from the original sample. dominant subsamples in terms of fitness values are inherited by the next generation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study proposes a data condensation method for multivariate kernel density estimation by genetic algorithm. First, our proposed algorithm generates multiple subsamples of a given size with replacement from the original sample. The subsamples and their constituting data points are regarded as $\it{chromosome}$ and $\it{gene}$, respectively, in the terminology of genetic algorithm. Second, each pair of subsamples breeds two new subsamples, where each data point faces either $\it{crossover}$, $\it{mutation}$, or $\it{reproduction}$ with a certain probability. The dominant subsamples in terms of fitness values are inherited by the next generation. This process is repeated generation by generation and brings the sparse representation of kernel density estimator in its completion. We confirmed from simulation studies that the resulting estimator can perform better than other well-known density estimators.

Related papers

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination [35.67742880001828]
We give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination. Our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy.
arXiv Detail & Related papers (2025-02-20T17:53:13Z)
Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z)
Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner. We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z)
DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition [1.278093617645299]
Anomaly detection can be conceived either through generative modelling of regular training data or by discriminating with respect to negative training data. This paper presents a novel hybrid anomaly score which allows dense open-set recognition on large natural images. Experiments evaluate our contributions on standard dense anomaly detection benchmarks as well as in terms of open-mIoU - a novel metric for dense open-set performance.
arXiv Detail & Related papers (2022-07-06T11:48:50Z)
Population based change-point detection for the identification of homozygosity islands [0.0]
We introduce a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast greedy binary splitting algorithm. We prove both algorithms converge almost surely to the set of change-points under very general assumptions on the distribution and independent sampling of the random vector. This new approach is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population.
arXiv Detail & Related papers (2021-11-19T12:53:41Z)
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data [2.0236506875465863]
Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. We use well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation.
arXiv Detail & Related papers (2020-10-22T12:27:43Z)
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. Two novel methods are proposed that exploit the geometric relationship between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
A Novel Granular-Based Bi-Clustering Method of Deep Mining the Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions. Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters. We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems. We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms [39.746670539407084]
We consider the problem of sampling from a strongly log-concave density in $bbRd$. We prove an information theoretic lower bound on the number of gradient queries of the log density needed.
arXiv Detail & Related papers (2020-02-01T23:46:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.