Decomposable Families of Itemsets
- URL: http://arxiv.org/abs/2006.09533v1
- Date: Tue, 16 Jun 2020 21:52:28 GMT
- Title: Decomposable Families of Itemsets
- Authors: Nikolaj Tatti, Hannes Heikinheimo
- Abstract summary: We present an approach to the problem of selecting a small, yet high quality subset of patterns from a larger collection of itemsets.
Such itemset families define a probabilistic model for the data from which the original collection of itemsets has been derived.
We provide an efficient algorithm to build decomposable itemset families, and give an application example with frequency bound querying.
- Score: 0.949290364986276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of selecting a small, yet high quality subset of patterns from a
larger collection of itemsets has recently attracted lot of research. Here we
discuss an approach to this problem using the notion of decomposable families
of itemsets. Such itemset families define a probabilistic model for the data
from which the original collection of itemsets has been derived from.
Furthermore, they induce a special tree structure, called a junction tree,
familiar from the theory of Markov Random Fields. The method has several
advantages. The junction trees provide an intuitive representation of the
mining results. From the computational point of view, the model provides
leverage for problems that could be intractable using the entire collection of
itemsets. We provide an efficient algorithm to build decomposable itemset
families, and give an application example with frequency bound querying using
the model. Empirical results show that our algorithm yields high quality
results.
Related papers
- Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case [0.0]
This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques.
In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated.
arXiv Detail & Related papers (2024-07-26T06:50:17Z) - Tree Learning: Optimal Algorithms and Sample Complexity [10.638365461509]
We study the problem of learning a hierarchical tree representation of data from labeled samples taken from an arbitrary distribution.
We present optimal sample bounds for this problem in several learning settings, including (agnostic) learning and online learning.
arXiv Detail & Related papers (2023-02-09T08:35:17Z) - Frequent Itemset-driven Search for Finding Minimum Node Separators in
Complex Networks [61.2383572324176]
We propose a frequent itemset-driven search approach, which integrates the concept of frequent itemset mining in data mining into the well-known memetic search framework.
It iteratively employs the frequent itemset recombination operator to generate promising offspring solution based on itemsets that frequently occur in high-quality solutions.
In particular, it discovers 29 new upper bounds and matches 18 previous best-known bounds.
arXiv Detail & Related papers (2022-01-18T11:16:40Z) - Efficient Bayesian network structure learning via local Markov boundary
search [17.1764265047364]
We analyze the complexity of learning directed acyclic graphical models from observational data without specific distributional assumptions.
We use a local Markov boundary search procedure in order to construct ancestral sets in the underlying graphical model.
Our approach is general, works for discrete or continuous distributions without distributional assumptions, and as such sheds light on the minimal assumptions required to efficiently learn the structure of directed graphical models from data.
arXiv Detail & Related papers (2021-10-12T15:33:59Z) - Probabilistic DAG Search [29.47649645431227]
We develop a probabilistic framework to exploit a search space's latent structure and share information across the search tree.
We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.
arXiv Detail & Related papers (2021-06-16T11:35:19Z) - Sample Complexity of Tree Search Configuration: Cutting Planes and
Beyond [98.92725321081994]
State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm.
We prove the first guarantees for learning high-performing cut-selection policies tailored to the instance distribution.
arXiv Detail & Related papers (2021-06-08T00:57:59Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Robust Estimation of Tree Structured Ising Models [20.224160348675422]
We consider the task of learning Ising models when the signs of different random variables are flipped independently with possibly unequal, unknown probabilities.
We first prove that this problem is unidentifiable, however, this unidentifiability is limited to a small equivalence class of trees formed by leaf nodes exchanging positions with their neighbors.
arXiv Detail & Related papers (2020-06-10T01:32:45Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.