HUSP-SP: Faster Utility Mining on Sequence Data
- URL: http://arxiv.org/abs/2212.14255v1
- Date: Thu, 29 Dec 2022 10:56:17 GMT
- Title: HUSP-SP: Faster Utility Mining on Sequence Data
- Authors: Chunkai Zhang, Yuting Yang, Zilin Du, Wensheng Gan, and Philip S. Yu
- Abstract summary: High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
- Score: 48.0426095077918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-utility sequential pattern mining (HUSPM) has emerged as an important
topic due to its wide application and considerable popularity. However, due to
the combinatorial explosion of the search space when the HUSPM problem
encounters a low utility threshold or large-scale data, it may be
time-consuming and memory-costly to address the HUSPM problem. Several
algorithms have been proposed for addressing this problem, but they still cost
a lot in terms of running time and memory usage. In this paper, to further
solve this problem efficiently, we design a compact structure called sequence
projection (seqPro) and propose an efficient algorithm, namely discovering
high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP
utilizes the compact seq-array to store the necessary information in a sequence
database. The seqPro structure is designed to efficiently calculate candidate
patterns' utilities and upper bound values. Furthermore, a new upper bound on
utility, namely tighter reduced sequence utility (TRSU) and two pruning
strategies in search space, are utilized to improve the mining performance of
HUSP-SP. Experimental results on both synthetic and real-life datasets show
that HUSP-SP can significantly outperform the state-of-the-art algorithms in
terms of running time, memory usage, search space pruning efficiency, and
scalability.
Related papers
- AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs [57.12929098407975]
We show that by efficiently parallelizing existing causal discovery methods, we can scale them to thousands of dimensions.
Specifically, we focus on the causal ordering subprocedure in DirectLiNGAM and implement GPU kernels to accelerate it.
This allows us to apply DirectLiNGAM to causal inference on large-scale gene expression data with genetic interventions yielding competitive results.
arXiv Detail & Related papers (2024-03-06T15:06:11Z) - A Projected Upper Bound for Mining High Utility Patterns from
Interval-Based Event Sequences [0.0]
We propose a projected upper bound on the utility of the patterns discovered from sequences of interval-based events.
Experimental results show that the new upper bound improves HUIPMiner performance in terms of both execution time and memory usage.
arXiv Detail & Related papers (2022-12-21T21:06:07Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - Towards Correlated Sequential Rules [4.743965372344134]
High-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns.
The existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules.
We propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM.
arXiv Detail & Related papers (2022-10-27T17:27:23Z) - Totally-ordered Sequential Rules for Utility Maximization [49.57003933142011]
We propose two novel algorithms, called TotalSR and TotalSR+, which aim to identify all high utility totally-ordered sequential rules (HTSRs)
TotalSR creates a utility table that can efficiently calculate antecedent support and a utility prefix sum list that can compute the remaining utility in O(1) time for a sequence.
There are numerous experimental results on both real and synthetic datasets demonstrating that TotalSR is significantly more efficient than algorithms with fewer pruning strategies.
arXiv Detail & Related papers (2022-09-27T16:17:58Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Itemset Utility Maximization with Correlation Measure [8.581840054840335]
High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk)
In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM)
Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
arXiv Detail & Related papers (2022-08-26T10:06:24Z) - TaSPM: Targeted Sequential Pattern Mining [53.234101208024335]
We propose a generic framework namely TaSPM, based on the fast CM-SPAM algorithm.
We also propose several pruning strategies to reduce meaningless operations in mining processes.
Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
arXiv Detail & Related papers (2022-02-26T17:49:47Z) - Improving a State-of-the-Art Heuristic for the Minimum Latency Problem
with Data Mining [69.00394670035747]
Hybrid metaheuristics have become a trend in operations research.
A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques.
arXiv Detail & Related papers (2019-08-28T13:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.