TargetUM: Targeted High-Utility Itemset Querying
- URL: http://arxiv.org/abs/2111.00309v1
- Date: Sat, 30 Oct 2021 18:55:28 GMT
- Title: TargetUM: Targeted High-Utility Itemset Querying
- Authors: Jinbao Miao, Shicheng Wan, Wensheng Gan, Jiayi Sun, and Jiahui Chen
- Abstract summary: This paper is the first to propose a target-based HUIM problem and to provide a clear formulation of the targeted utility mining task.
A tree-based algorithm known as Target-based high-Utility iteMset querying using (TargetUM) is proposed.
The algorithm uses a lexicographic querying tree and three effective pruning strategies to improve the mining efficiency.
- Score: 1.022709144903362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional high-utility itemset mining (HUIM) aims to determine all
high-utility itemsets (HUIs) that satisfy the minimum utility threshold
(\textit{minUtil}) in transaction databases. However, in most applications, not
all HUIs are interesting because only specific parts are required. Thus,
targeted mining based on user preferences is more important than traditional
mining tasks. This paper is the first to propose a target-based HUIM problem
and to provide a clear formulation of the targeted utility mining task in a
quantitative transaction database. A tree-based algorithm known as Target-based
high-Utility iteMset querying using (TargetUM) is proposed. The algorithm uses
a lexicographic querying tree and three effective pruning strategies to improve
the mining efficiency. We implemented experimental validation on several real
and synthetic databases, and the results demonstrate that the performance of
\textbf{TargetUM} is satisfactory, complete, and correct. Finally, owing to the
lexicographic querying tree, the database no longer needs to be scanned
repeatedly for multiple queries.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques.
Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks.
We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Itemset Utility Maximization with Correlation Measure [8.581840054840335]
High utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk)
In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM)
Two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space.
arXiv Detail & Related papers (2022-08-26T10:06:24Z) - Towards Target High-Utility Itemsets [2.824395407508717]
In applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases.
Targeted high-utility itemset mining has emerged as a key research topic.
We propose THUIM (Targeted High-Utility Itemset Mining), which can quickly match high-utility itemsets during the mining process to select the targeted patterns.
arXiv Detail & Related papers (2022-06-09T18:42:58Z) - Frequent Itemset-driven Search for Finding Minimum Node Separators in
Complex Networks [61.2383572324176]
We propose a frequent itemset-driven search approach, which integrates the concept of frequent itemset mining in data mining into the well-known memetic search framework.
It iteratively employs the frequent itemset recombination operator to generate promising offspring solution based on itemsets that frequently occur in high-quality solutions.
In particular, it discovers 29 new upper bounds and matches 18 previous best-known bounds.
arXiv Detail & Related papers (2022-01-18T11:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.