Itemset Utility Maximization with Correlation Measure
- URL: http://arxiv.org/abs/2208.12551v1
- Date: Fri, 26 Aug 2022 10:06:24 GMT
- Title: Itemset Utility Maximization with Correlation Measure
- Authors: Jiahui Chen, Yixin Xu, Shicheng Wan, Wensheng Gan, and Jerry Chun-Wei
Lin
- Abstract summary: High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk)
In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM)
Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
- Score: 8.581840054840335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important data mining technology, high utility itemset mining (HUIM) is
used to find out interesting but hidden information (e.g., profit and risk).
HUIM has been widely applied in many application scenarios, such as market
analysis, medical detection, and web click stream analysis. However, most
previous HUIM approaches often ignore the relationship between items in an
itemset. Therefore, many irrelevant combinations (e.g., \{gold, apple\} and
\{notebook, book\}) are discovered in HUIM. To address this limitation, many
algorithms have been proposed to mine correlated high utility itemsets
(CoHUIs). In this paper, we propose a novel algorithm called the Itemset
Utility Maximization with Correlation Measure (CoIUM), which considers both a
strong correlation and the profitable values of the items. Besides, the novel
algorithm adopts a database projection mechanism to reduce the cost of database
scanning. Moreover, two upper bounds and four pruning strategies are utilized
to effectively prune the search space. And a concise array-based structure
named utility-bin is used to calculate and store the adopted upper bounds in
linear time and space. Finally, extensive experimental results on dense and
sparse datasets demonstrate that CoIUM significantly outperforms the
state-of-the-art algorithms in terms of runtime and memory consumption.
Related papers
- Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Temporal Fuzzy Utility Maximization with Remaining Measure [1.642022526257133]
We propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM.
TFUM revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory.
It then discovers a complete set of real interesting patterns in a short time.
arXiv Detail & Related papers (2022-08-26T05:09:56Z) - Efficient and Accurate Top-$K$ Recovery from Choice Data [1.14219428942199]
In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items.
We propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery.
We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models.
arXiv Detail & Related papers (2022-06-23T22:05:08Z) - TargetUM: Targeted High-Utility Itemset Querying [1.022709144903362]
This paper is the first to propose a target-based HUIM problem and to provide a clear formulation of the targeted utility mining task.
A tree-based algorithm known as Target-based high-Utility iteMset querying using (TargetUM) is proposed.
The algorithm uses a lexicographic querying tree and three effective pruning strategies to improve the mining efficiency.
arXiv Detail & Related papers (2021-10-30T18:55:28Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.