Sparse Density Trees and Lists: An Interpretable Alternative to
High-Dimensional Histograms
- URL: http://arxiv.org/abs/1510.06779v5
- Date: Wed, 15 Nov 2023 07:12:05 GMT
- Title: Sparse Density Trees and Lists: An Interpretable Alternative to
High-Dimensional Histograms
- Authors: Siong Thye Goh, Lesia Semenova, Cynthia Rudin
- Abstract summary: We present tree-based and list-based density estimation methods for binary/categorical data.
Our density estimation models are higher dimensional analogies to variable bin width histograms.
We present an application to crime analysis, where we estimate how unusual each type of modus operandi is for a house break-in.
- Score: 19.134568072720956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present sparse tree-based and list-based density estimation methods for
binary/categorical data. Our density estimation models are higher dimensional
analogies to variable bin width histograms. In each leaf of the tree (or list),
the density is constant, similar to the flat density within the bin of a
histogram. Histograms, however, cannot easily be visualized in more than two
dimensions, whereas our models can. The accuracy of histograms fades as
dimensions increase, whereas our models have priors that help with
generalization. Our models are sparse, unlike high-dimensional fixed-bin
histograms. We present three generative modeling methods, where the first one
allows the user to specify the preferred number of leaves in the tree within a
Bayesian prior. The second method allows the user to specify the preferred
number of branches within the prior. The third method returns density lists
(rather than trees) and allows the user to specify the preferred number of
rules and the length of rules within the prior. The new approaches often yield
a better balance between sparsity and accuracy of density estimates than other
methods for this task. We present an application to crime analysis, where we
estimate how unusual each type of modus operandi is for a house break-in.
Related papers
- Conditional Density Estimation with Histogram Trees [3.5297361401370044]
Conditional density estimation (CDE) goes beyond regression by modeling the full conditional distribution.
Current methods typically employ kernel-based approaches, using kernel functions directly for kernel density estimation or as basis functions in linear models.
We propose the Conditional Density Tree (CDTree), a fully non-parametric model consisting of a decision tree in which each leaf is formed by a histogram model.
arXiv Detail & Related papers (2024-10-15T09:53:24Z) - Sparse Training of Discrete Diffusion Models for Graph Generation [45.103518022696996]
We introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse.
By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network.
Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets.
arXiv Detail & Related papers (2023-11-03T16:50:26Z) - Two-level histograms for dealing with outliers and heavy tail
distributions [0.0]
We focus on the G-Enum histogram method, which exploits the Minimum Description Length (MDL) principle to build histograms without any user parameter.
We investigate on the limits of this method in the case of outliers or heavy-tailed distributions.
The first level exploits a logarithmic transformation of the data to split the data set into a list of data subsets with a controlled range of values.
The second level builds a sub-histogram for each data subset and aggregates them to obtain a complete histogram.
arXiv Detail & Related papers (2023-06-09T09:57:18Z) - Contour Context: Abstract Structural Distribution for 3D LiDAR Loop
Detection and Metric Pose Estimation [31.968749056155467]
This paper proposes a simple, effective, and efficient topological loop closure detection pipeline with accurate 3-DoF metric pose estimation.
We interpret the Cartesian birds' eye view (BEV) image projected from 3D LiDAR points as layered distribution of structures.
A retrieval key is designed to accelerate the search of a database indexed by layered KD-trees.
arXiv Detail & Related papers (2023-02-13T07:18:24Z) - Unveiling the Sampling Density in Non-Uniform Geometric Graphs [69.93864101024639]
We consider graphs as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius.
In a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius.
We develop methods to estimate the unknown sampling density in a self-supervised fashion.
arXiv Detail & Related papers (2022-10-15T08:01:08Z) - Wasserstein Iterative Networks for Barycenter Estimation [80.23810439485078]
We present an algorithm to approximate the Wasserstein-2 barycenters of continuous measures via a generative model.
Based on the celebrity faces dataset, we construct Ave, celeba! dataset which can be used for quantitative evaluation of barycenter algorithms.
arXiv Detail & Related papers (2022-01-28T16:59:47Z) - Sketch-Based Anomaly Detection in Streaming Graphs [89.52200264469364]
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges and subgraphs in an online manner?
Our method is the first streaming approach that incorporates dense subgraph search to detect graph anomalies in constant memory and time.
arXiv Detail & Related papers (2021-06-08T16:10:36Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Unsupervised Discretization by Two-dimensional MDL-based Histogram [0.0]
Unsupervised discretization is a crucial step in many knowledge discovery tasks.
We propose an expressive model class that allows for far more flexible partitions of two-dimensional data.
We introduce a algorithm, named PALM, which Partitions each dimension ALternately and then Merges neighboring regions.
arXiv Detail & Related papers (2020-06-02T19:19:49Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.