Related papers: FunQuant: A R package to perform quantization in the context of rare events and time-consuming simulations

FunQuant: A R package to perform quantization in the context of rare events and time-consuming simulations

URL: http://arxiv.org/abs/2308.10871v1
Date: Fri, 18 Aug 2023 08:34:45 GMT
Title: FunQuant: A R package to perform quantization in the context of rare events and time-consuming simulations
Authors: Charlie Sire and Yann Richet and Rodolphe Le Riche and Didier Rulli\`ere and J\'er\'emy Rohmer and Lucie Pheulpin
Abstract summary: Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono"i cells, that can be seen as clusters.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono\"i cells, that can be seen as clusters, and constructs a discrete distribution based on their centroids and probabilistic masses. Lloyd's algorithm estimates the optimal centroids in a minimal expected distance sense, but this approach poses significant challenges in scenarios where data evaluation is costly, and relates to rare events. Then, the single cluster associated to no event takes the majority of the probability mass. In this context, a metamodel is required and adapted sampling methods are necessary to increase the precision of the computations on the rare clusters.

Related papers

Mixture models for data with unknown distributions [0.6345523830122168]
We describe and analyze a broad class of mixture models for real-valued multivariate data. We return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.
arXiv Detail & Related papers (2025-02-26T22:42:40Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Spatial Process Approximations: Assessing Their Necessity [5.8666339171606445]
In spatial statistics and machine learning, the kernel matrix plays a pivotal role in prediction, classification, and maximum likelihood estimation. A review of current methodologies for managing large spatial data indicates that some fail to address this ill-conditioning problem. This paper introduces various optimality criteria and provides solutions for each.
arXiv Detail & Related papers (2023-11-06T15:46:03Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Bregman Power k-Means for Clustering Exponential Family Data [11.434503492579477]
We bridge algorithmic advances to classical work on hard clustering under Bregman divergences. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm. We consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.
arXiv Detail & Related papers (2022-06-22T06:09:54Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z)
Hyperdimensional Computing for Efficient Distributed Classification with Randomized Neural Networks [5.942847925681103]
We study distributed classification, which can be employed in situations were data cannot be stored at a central location nor shared. We propose a more efficient solution for distributed classification by making use of a lossy compression approach applied when sharing the local classifiers with other agents.
arXiv Detail & Related papers (2021-06-02T01:33:56Z)
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications. This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model. We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)
Scalable Distributed Approximation of Internal Measures for Clustering Evaluation [5.144809478361603]
Internal measure for clustering evaluation is the silhouette coefficient, whose computation requires a quadratic number of distance calculations. We present the first scalable algorithm to compute such a rigorous approximation for the evaluation of clusterings based on any metric distances. We also prove that the algorithm can be adapted to obtain rigorous approximations of other internal measures of clustering quality, such as cohesion and separation.
arXiv Detail & Related papers (2020-03-03T10:28:14Z)
A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)
Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling. We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.