Related papers: Parallel algorithms for mining of frequent itemsets

Parallel algorithms for mining of frequent itemsets

URL: http://arxiv.org/abs/2108.05038v1
Date: Wed, 11 Aug 2021 05:22:52 GMT
Title: Parallel algorithms for mining of frequent itemsets
Authors: Robert Kessl
Abstract summary: We develop a parallel method for mining of frequent itemsets on a distributed memory parallel computer. Our method achieves speedup of 6 on 10 processors.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the data do not fit into main memory and the processing time can become quite huge. Therefore, we need parallel data mining algorithms. One of the popular and important data mining algorithm is the algorithm for generation of so called frequent itemsets. The problem of mining of frequent itemsets can be explained on the following example: customers goes in a store put into theirs baskets some goods; the owner of the store collects the baskets and wants to know the set of goods that are bought together in at least p% of the baskets. Currently, the sequential algorithms for mining of frequent itemsets are quite good in the means of performance. However, the parallel algorithms for mining of frequent itemsets still do not achieve good speedup. In this thesis, we develop a parallel method for mining of frequent itemsets that can be used for an arbitrary depth first search sequential algorithms on a distributed memory parallel computer. Our method achieves speedup of ~ 6 on 10 processors. The method is based on an approximate estimation of processor load from a database sample - however it always computes the set of frequent itemsets from the whole database. In this thesis, we show a theory underlying our method and show the performance of the estimation process.

Related papers

Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
Fast Redescription Mining Using Locality-Sensitive Hashing [1.126524823245055]
We present new algorithms that perform the matching and extension orders of magnitude faster than the existing approaches. Our algorithms are based on locality-sensitive hashing with a tailored approach to handle the discretisation of numerical attributes.
arXiv Detail & Related papers (2024-06-06T15:13:48Z)
Mining Weighted Sequential Patterns in Incremental Uncertain Databases [2.668038211242538]
We have developed an algorithm to mine frequent sequences in an uncertain database. We have proposed two new techniques for mining when the database is incremental.
arXiv Detail & Related papers (2024-03-31T17:32:08Z)
Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes. We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z)
Quantum Algorithm for Researching the Nearest (QARN) [0.0]
Quantum computing acts as an attractive alternative to parallel computing with qubits, qudits and their distinctive properties. The quantum algorithm proposed in this paper allows to search for the best (closest to a given element in a random data array) by storing all its initial elements in a superposition. This allows to perform the search operations on all elements at the same time and due to the same to save the amount of RAM.
arXiv Detail & Related papers (2023-04-21T14:21:09Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining. It aims to discover itemsets that have high relative utility in their selling time period. It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets. We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z)
Itemset Utility Maximization with Correlation Measure [8.581840054840335]
High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk) In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM) Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
arXiv Detail & Related papers (2022-08-26T10:06:24Z)
Memory Bounds for the Experts Problem [53.67419690563877]
Online learning with expert advice is a fundamental problem of sequential prediction. The goal is to process predictions, and make a prediction with the minimum cost. An algorithm is judged by how well it does compared to the best expert in the set.
arXiv Detail & Related papers (2022-04-21T01:22:18Z)
Triangle and Four Cycle Counting with Predictions in Graph Streams [59.05440236993604]
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles. We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
arXiv Detail & Related papers (2022-03-17T19:26:00Z)
Efficient and Local Parallel Random Walks [21.29022677162416]
Random walks are a fundamental primitive used in many machine learning algorithms. We present a new algorithm that overcomes this limitation by building random walk efficiently and locally. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm.
arXiv Detail & Related papers (2021-12-01T17:06:11Z)
Correlation Clustering in Constant Many Parallel Rounds [42.10280805559555]
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. Our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds.
arXiv Detail & Related papers (2021-06-15T21:45:45Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.