Parallel algorithms for mining of frequent itemsets
- URL: http://arxiv.org/abs/2108.05038v1
- Date: Wed, 11 Aug 2021 05:22:52 GMT
- Title: Parallel algorithms for mining of frequent itemsets
- Authors: Robert Kessl
- Abstract summary: We develop a parallel method for mining of frequent itemsets on a distributed memory parallel computer.
Our method achieves speedup of 6 on 10 processors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the recent decade companies started collecting of large amount of data.
Without a proper analyse, the data are usually useless. The field of analysing
the data is called data mining. Unfortunately, the amount of data is quite
large: the data do not fit into main memory and the processing time can become
quite huge. Therefore, we need parallel data mining algorithms. One of the
popular and important data mining algorithm is the algorithm for generation of
so called frequent itemsets. The problem of mining of frequent itemsets can be
explained on the following example: customers goes in a store put into theirs
baskets some goods; the owner of the store collects the baskets and wants to
know the set of goods that are bought together in at least p% of the baskets.
Currently, the sequential algorithms for mining of frequent itemsets are quite
good in the means of performance. However, the parallel algorithms for mining
of frequent itemsets still do not achieve good speedup. In this thesis, we
develop a parallel method for mining of frequent itemsets that can be used for
an arbitrary depth first search sequential algorithms on a distributed memory
parallel computer. Our method achieves speedup of ~ 6 on 10 processors. The
method is based on an approximate estimation of processor load from a database
sample - however it always computes the set of frequent itemsets from the whole
database. In this thesis, we show a theory underlying our method and show the
performance of the estimation process.
Related papers
- Fast Redescription Mining Using Locality-Sensitive Hashing [1.126524823245055]
We present new algorithms that perform the matching and extension orders of magnitude faster than the existing approaches.
Our algorithms are based on locality-sensitive hashing with a tailored approach to handle the discretisation of numerical attributes.
arXiv Detail & Related papers (2024-06-06T15:13:48Z) - Mining Weighted Sequential Patterns in Incremental Uncertain Databases [2.668038211242538]
We have developed an algorithm to mine frequent sequences in an uncertain database.
We have proposed two new techniques for mining when the database is incremental.
arXiv Detail & Related papers (2024-03-31T17:32:08Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Quantum Algorithm for Researching the Nearest (QARN) [0.0]
Quantum computing acts as an attractive alternative to parallel computing with qubits, qudits and their distinctive properties.
The quantum algorithm proposed in this paper allows to search for the best (closest to a given element in a random data array) by storing all its initial elements in a superposition.
This allows to perform the search operations on all elements at the same time and due to the same to save the amount of RAM.
arXiv Detail & Related papers (2023-04-21T14:21:09Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Memory Bounds for the Experts Problem [53.67419690563877]
Online learning with expert advice is a fundamental problem of sequential prediction.
The goal is to process predictions, and make a prediction with the minimum cost.
An algorithm is judged by how well it does compared to the best expert in the set.
arXiv Detail & Related papers (2022-04-21T01:22:18Z) - Triangle and Four Cycle Counting with Predictions in Graph Streams [59.05440236993604]
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles.
We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms.
Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
arXiv Detail & Related papers (2022-03-17T19:26:00Z) - Efficient and Local Parallel Random Walks [21.29022677162416]
Random walks are a fundamental primitive used in many machine learning algorithms.
We present a new algorithm that overcomes this limitation by building random walk efficiently and locally.
We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm.
arXiv Detail & Related papers (2021-12-01T17:06:11Z) - Correlation Clustering in Constant Many Parallel Rounds [42.10280805559555]
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining.
In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work.
Our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds.
arXiv Detail & Related papers (2021-06-15T21:45:45Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.