Related papers: Itemset Utility Maximization with Correlation Measure

Itemset Utility Maximization with Correlation Measure

URL: http://arxiv.org/abs/2208.12551v1
Date: Fri, 26 Aug 2022 10:06:24 GMT
Title: Itemset Utility Maximization with Correlation Measure
Authors: Jiahui Chen, Yixin Xu, Shicheng Wan, Wensheng Gan, and Jerry Chun-Wei Lin
Abstract summary: High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk) In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM) Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
Score: 8.581840054840335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an important data mining technology, high utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk). HUIM has been widely applied in many application scenarios, such as market analysis, medical detection, and web click stream analysis. However, most previous HUIM approaches often ignore the relationship between items in an itemset. Therefore, many irrelevant combinations (e.g., \{gold, apple\} and \{notebook, book\}) are discovered in HUIM. To address this limitation, many algorithms have been proposed to mine correlated high utility itemsets (CoHUIs). In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM), which considers both a strong correlation and the profitable values of the items. Besides, the novel algorithm adopts a database projection mechanism to reduce the cost of database scanning. Moreover, two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space. Finally, extensive experimental results on dense and sparse datasets demonstrate that CoIUM significantly outperforms the state-of-the-art algorithms in terms of runtime and memory consumption.

Related papers

Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets [8.1990111961557]
We investigate the behavior of state-of-the-art retrieval algorithms on massive datasets. We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval. We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.
arXiv Detail & Related papers (2025-01-20T17:59:21Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step. Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples. We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z)
HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP) Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z)
Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z)
A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining. It aims to discover itemsets that have high relative utility in their selling time period. It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets. We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z)
Temporal Fuzzy Utility Maximization with Remaining Measure [1.642022526257133]
We propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM. TFUM revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory. It then discovers a complete set of real interesting patterns in a short time.
arXiv Detail & Related papers (2022-08-26T05:09:56Z)
Efficient and Accurate Top-$K$ Recovery from Choice Data [1.14219428942199]
In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items. We propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery. We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models.
arXiv Detail & Related papers (2022-06-23T22:05:08Z)
TargetUM: Targeted High-Utility Itemset Querying [1.022709144903362]
This paper is the first to propose a target-based HUIM problem and to provide a clear formulation of the targeted utility mining task. A tree-based algorithm known as Target-based high-Utility iteMset querying using (TargetUM) is proposed. The algorithm uses a lexicographic querying tree and three effective pruning strategies to improve the mining efficiency.
arXiv Detail & Related papers (2021-10-30T18:55:28Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.