A Generic Algorithm for Top-K On-Shelf Utility Mining
- URL: http://arxiv.org/abs/2208.14230v1
- Date: Sat, 27 Aug 2022 03:08:00 GMT
- Title: A Generic Algorithm for Top-K On-Shelf Utility Mining
- Authors: Jiahui Chen, Xu Guo, Wensheng Gan, Shichen Wan, and Philip S. Yu
- Abstract summary: On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
- Score: 47.729883172648876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-shelf utility mining (OSUM) is an emerging research direction in data
mining. It aims to discover itemsets that have high relative utility in their
selling time period. Compared with traditional utility mining, OSUM can find
more practical and meaningful patterns in real-life applications. However,
there is a major drawback to traditional OSUM. For normal users, it is hard to
define a minimum threshold minutil for mining the right amount of on-shelf high
utility itemsets. On one hand, if the threshold is set too high, the number of
patterns would not be enough. On the other hand, if the threshold is set too
low, too many patterns will be discovered and cause an unnecessary waste of
time and memory consumption. To address this issue, the user usually directly
specifies a parameter k, where only the top-k high relative utility itemsets
would be considered. Therefore, in this paper, we propose a generic algorithm
named TOIT for mining Top-k On-shelf hIgh-utility paTterns to solve this
problem. TOIT applies a novel strategy to raise the minutil based on the
on-shelf datasets. Besides, two novel upper-bound strategies named subtree
utility and local utility are applied to prune the search space. By adopting
the strategies mentioned above, the TOIT algorithm can narrow the search space
as early as possible, improve the mining efficiency, and reduce the memory
consumption, so it can obtain better performance than other algorithms. A
series of experiments have been conducted on real datasets with different
styles to compare the effects with the state-of-the-art KOSHU algorithm. The
experimental results showed that TOIT outperforms KOSHU in both running time
and memory consumption.
Related papers
- HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - Totally-ordered Sequential Rules for Utility Maximization [49.57003933142011]
We propose two novel algorithms, called TotalSR and TotalSR+, which aim to identify all high utility totally-ordered sequential rules (HTSRs)
TotalSR creates a utility table that can efficiently calculate antecedent support and a utility prefix sum list that can compute the remaining utility in O(1) time for a sequence.
There are numerous experimental results on both real and synthetic datasets demonstrating that TotalSR is significantly more efficient than algorithms with fewer pruning strategies.
arXiv Detail & Related papers (2022-09-27T16:17:58Z) - Itemset Utility Maximization with Correlation Measure [8.581840054840335]
High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk)
In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM)
Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
arXiv Detail & Related papers (2022-08-26T10:06:24Z) - Temporal Fuzzy Utility Maximization with Remaining Measure [1.642022526257133]
We propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM.
TFUM revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory.
It then discovers a complete set of real interesting patterns in a short time.
arXiv Detail & Related papers (2022-08-26T05:09:56Z) - Towards Target High-Utility Itemsets [2.824395407508717]
In applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases.
Targeted high-utility itemset mining has emerged as a key research topic.
We propose THUIM (Targeted High-Utility Itemset Mining), which can quickly match high-utility itemsets during the mining process to select the targeted patterns.
arXiv Detail & Related papers (2022-06-09T18:42:58Z) - US-Rule: Discovering Utility-driven Sequential Rules [52.68017415747925]
We propose a faster algorithm, called US-Rule, to efficiently mine high-utility sequential rules.
Four tighter upper bounds (LEEU, REEU, LERSU, RERSU) and their corresponding pruning strategies are proposed.
US-Rule can achieve better performance in terms of execution time, memory consumption and scalability.
arXiv Detail & Related papers (2021-11-29T23:38:28Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.