TaSPM: Targeted Sequential Pattern Mining
- URL: http://arxiv.org/abs/2202.13202v1
- Date: Sat, 26 Feb 2022 17:49:47 GMT
- Title: TaSPM: Targeted Sequential Pattern Mining
- Authors: Gengsen Huang, Wensheng Gan, and Philip S. Yu
- Abstract summary: We propose a generic framework namely TaSPM, based on the fast CM-SPAM algorithm.
We also propose several pruning strategies to reduce meaningless operations in mining processes.
Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
- Score: 53.234101208024335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential pattern mining (SPM) is an important technique of pattern mining,
which has many applications in reality. Although many efficient sequential
pattern mining algorithms have been proposed, there are few studies can focus
on target sequences. Targeted querying sequential patterns can not only reduce
the number of sequences generated by SPM, but also improve the efficiency of
users in performing pattern analysis. The current algorithms available on
targeted sequence querying are based on specific scenarios and cannot be
generalized to other applications. In this paper, we formulate the problem of
targeted sequential pattern mining and propose a generic framework namely
TaSPM, based on the fast CM-SPAM algorithm. What's more, to improve the
efficiency of TaSPM on large-scale datasets and multiple-items-based sequence
datasets, we propose several pruning strategies to reduce meaningless
operations in mining processes. Totally four pruning strategies are designed in
TaSPM, and hence it can terminate unnecessary pattern extensions quickly and
achieve better performance. Finally, we conduct extensive experiments on
different datasets to compare the existing SPM algorithms with TaSPM.
Experiments show that the novel targeted mining algorithm TaSPM can achieve
faster running time and less memory consumption.
Related papers
- tSPM+; a high-performance algorithm for mining transitive sequential
patterns from clinical data [5.340674706271038]
We present the tSPM+ algorithm, a high-performance implementation of the tSPM algorithm, which adds a new dimension by adding the duration to the temporal patterns.
We show that the tSPM+ algorithm provides a speed up to factor 980 and a up to 48 fold improvement in memory consumption.
arXiv Detail & Related papers (2023-09-08T17:47:31Z) - HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z) - Towards Correlated Sequential Rules [4.743965372344134]
High-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns.
The existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules.
We propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM.
arXiv Detail & Related papers (2022-10-27T17:27:23Z) - Towards Target Sequential Rules [52.4562332499155]
We propose an efficient algorithm, called targeted sequential rule mining (TaSRM)
It is shown that the novel algorithm TaSRM and its variants can achieve better experimental performance compared to the existing baseline algorithm.
arXiv Detail & Related papers (2022-06-09T18:59:54Z) - Towards Target High-Utility Itemsets [2.824395407508717]
In applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases.
Targeted high-utility itemset mining has emerged as a key research topic.
We propose THUIM (Targeted High-Utility Itemset Mining), which can quickly match high-utility itemsets during the mining process to select the targeted patterns.
arXiv Detail & Related papers (2022-06-09T18:42:58Z) - US-Rule: Discovering Utility-driven Sequential Rules [52.68017415747925]
We propose a faster algorithm, called US-Rule, to efficiently mine high-utility sequential rules.
Four tighter upper bounds (LEEU, REEU, LERSU, RERSU) and their corresponding pruning strategies are proposed.
US-Rule can achieve better performance in terms of execution time, memory consumption and scalability.
arXiv Detail & Related papers (2021-11-29T23:38:28Z) - MLE-guided parameter search for task loss minimization in neural
sequence modeling [83.83249536279239]
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks.
We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient.
Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
arXiv Detail & Related papers (2020-06-04T22:21:22Z) - Improving a State-of-the-Art Heuristic for the Minimum Latency Problem
with Data Mining [69.00394670035747]
Hybrid metaheuristics have become a trend in operations research.
A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques.
arXiv Detail & Related papers (2019-08-28T13:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.