Itemset Utility Maximization with Correlation Measure
- URL: http://arxiv.org/abs/2208.12551v1
- Date: Fri, 26 Aug 2022 10:06:24 GMT
- Title: Itemset Utility Maximization with Correlation Measure
- Authors: Jiahui Chen, Yixin Xu, Shicheng Wan, Wensheng Gan, and Jerry Chun-Wei
Lin
- Abstract summary: High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk)
In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM)
Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
- Score: 8.581840054840335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important data mining technology, high utility itemset mining (HUIM) is
used to find out interesting but hidden information (e.g., profit and risk).
HUIM has been widely applied in many application scenarios, such as market
analysis, medical detection, and web click stream analysis. However, most
previous HUIM approaches often ignore the relationship between items in an
itemset. Therefore, many irrelevant combinations (e.g., \{gold, apple\} and
\{notebook, book\}) are discovered in HUIM. To address this limitation, many
algorithms have been proposed to mine correlated high utility itemsets
(CoHUIs). In this paper, we propose a novel algorithm called the Itemset
Utility Maximization with Correlation Measure (CoIUM), which considers both a
strong correlation and the profitable values of the items. Besides, the novel
algorithm adopts a database projection mechanism to reduce the cost of database
scanning. Moreover, two upper bounds and four pruning strategies are utilized
to effectively prune the search space. And a concise array-based structure
named utility-bin is used to calculate and store the adopted upper bounds in
linear time and space. Finally, extensive experimental results on dense and
sparse datasets demonstrate that CoIUM significantly outperforms the
state-of-the-art algorithms in terms of runtime and memory consumption.
Related papers
- Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.
The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.
We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z) - Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets [8.1990111961557]
We investigate the behavior of state-of-the-art retrieval algorithms on massive datasets.
We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval.
We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.
arXiv Detail & Related papers (2025-01-20T17:59:21Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Temporal Fuzzy Utility Maximization with Remaining Measure [1.642022526257133]
We propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM.
TFUM revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory.
It then discovers a complete set of real interesting patterns in a short time.
arXiv Detail & Related papers (2022-08-26T05:09:56Z) - TargetUM: Targeted High-Utility Itemset Querying [1.022709144903362]
This paper is the first to propose a target-based HUIM problem and to provide a clear formulation of the targeted utility mining task.
A tree-based algorithm known as Target-based high-Utility iteMset querying using (TargetUM) is proposed.
The algorithm uses a lexicographic querying tree and three effective pruning strategies to improve the mining efficiency.
arXiv Detail & Related papers (2021-10-30T18:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.