Discovery data topology with the closure structure. Theoretical and
practical aspects
- URL: http://arxiv.org/abs/2010.02628v3
- Date: Tue, 30 Mar 2021 08:30:16 GMT
- Title: Discovery data topology with the closure structure. Theoretical and
practical aspects
- Authors: Tatiana Makhalova, Aleksey Buzmakov, Sergei O. Kuznetsov and Amedeo
Napoli
- Abstract summary: We introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators.
We propose a formalization of the closure structure in terms of Formal Concept Analysis.
We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm.
- Score: 21.70710923045654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we are revisiting pattern mining and especially itemset
mining, which allows one to analyze binary datasets in searching for
interesting and meaningful association rules and respective itemsets in an
unsupervised way. While a summarization of a dataset based on a set of patterns
does not provide a general and satisfying view over a dataset, we introduce a
concise representation -- the closure structure -- based on closed itemsets and
their minimum generators, for capturing the intrinsic content of a dataset. The
closure structure allows one to understand the topology of the dataset in the
whole and the inherent complexity of the data. We propose a formalization of
the closure structure in terms of Formal Concept Analysis, which is well
adapted to study this data topology. We present and demonstrate theoretical
results, and as well, practical results using the GDPM algorithm. GDPM is
rather unique in its functionality as it returns a characterization of the
topology of a dataset in terms of complexity levels, highlighting the diversity
and the distribution of the itemsets. Finally, a series of experiments shows
how GDPM can be practically used and what can be expected from the output.
Related papers
- Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Unsupervised hierarchical clustering using the learning dynamics of RBMs [0.0]
We present a new and general method for building relational data trees by exploiting the learning dynamics of the Restricted Boltzmann Machine (RBM)
Our method is based on the mean-field approach, derived from the Plefka expansion, and developed in context of disordered systems.
We tested our method in an artificially hierarchical dataset and on three different real-world datasets (images of digits, mutations in the human genome, and a family of proteins)
arXiv Detail & Related papers (2023-02-03T16:53:32Z) - Topological Learning in Multi-Class Data Sets [0.3050152425444477]
We study the impact of topological complexity on learning in feedforward deep neural networks (DNNs)
We evaluate our topological classification algorithm on multiple constructed and open source data sets.
arXiv Detail & Related papers (2023-01-23T21:54:25Z) - Feature construction using explanations of individual predictions [0.0]
We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models.
We empirically show that reducing the search to these groups significantly reduces the time of feature construction.
We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
arXiv Detail & Related papers (2023-01-23T18:59:01Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data.
Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling.
We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z) - Structural Learning of Probabilistic Sentential Decision Diagrams under
Partial Closed-World Assumption [127.439030701253]
Probabilistic sentential decision diagrams are a class of structured-decomposable circuits.
We propose a new scheme based on a partial closed-world assumption: data implicitly provide the logical base of the circuit.
Preliminary experiments show that the proposed approach might properly fit training data, and generalize well to test data, provided that these remain consistent with the underlying logical base.
arXiv Detail & Related papers (2021-07-26T12:01:56Z) - Clustering multivariate functional data using unsupervised binary trees [0.0]
We propose a model-based clustering algorithm for a general class of functional data.
The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain.
The new algorithm provides easily interpretable results and fast predictions for online data sets.
arXiv Detail & Related papers (2020-12-10T20:56:49Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Hierarchical regularization networks for sparsification based learning
on noisy datasets [0.0]
hierarchy follows from approximation spaces identified at successively finer scales.
For promoting model generalization at each scale, we also introduce a novel, projection based penalty operator across multiple dimension.
Results show the performance of the approach as a data reduction and modeling strategy on both synthetic and real datasets.
arXiv Detail & Related papers (2020-06-09T18:32:24Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.