Flexible Pattern Discovery and Analysis
- URL: http://arxiv.org/abs/2111.12218v1
- Date: Wed, 24 Nov 2021 01:25:15 GMT
- Title: Flexible Pattern Discovery and Analysis
- Authors: Chien-Ming Chen, Lili Chen, and Wensheng Gan
- Abstract summary: We introduce an algorithm for the mining of flexible high utility-occupancy patterns.
The proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets.
- Score: 2.075126998649103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the analysis of the proportion of utility in the supporting
transactions used in the field of data mining, high utility-occupancy pattern
mining (HUOPM) has recently attracted widespread attention. Unlike high-utility
pattern mining (HUPM), which involves the enumeration of high-utility (e.g.,
profitable) patterns, HUOPM aims to find patterns representing a collection of
existing transactions. In practical applications, however, not all patterns are
used or valuable. For example, a pattern might contain too many items, that is,
the pattern might be too specific and therefore lack value for users in real
life. To achieve qualified patterns with a flexible length, we constrain the
minimum and maximum lengths during the mining process and introduce a novel
algorithm for the mining of flexible high utility-occupancy patterns. Our
algorithm is referred to as HUOPM+. To ensure the flexibility of the patterns
and tighten the upper bound of the utility-occupancy, a strategy called the
length upper-bound (LUB) is presented to prune the search space. In addition, a
utility-occupancy nested list (UO-nlist) and a frequency-utility-occupancy
table (FUO-table) are employed to avoid multiple scans of the database.
Evaluation results of the subsequent experiments confirm that the proposed
algorithm can effectively control the length of the derived patterns, for both
real-world and synthetic datasets. Moreover, it can decrease the execution time
and memory consumption.
Related papers
- Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - A Generic Algorithm for Top-K On-Shelf Utility Mining [47.729883172648876]
On-shelf utility mining (OSUM) is an emerging research direction in data mining.
It aims to discover itemsets that have high relative utility in their selling time period.
It is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets.
We propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns.
arXiv Detail & Related papers (2022-08-27T03:08:00Z) - Temporal Fuzzy Utility Maximization with Remaining Measure [1.642022526257133]
We propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM.
TFUM revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory.
It then discovers a complete set of real interesting patterns in a short time.
arXiv Detail & Related papers (2022-08-26T05:09:56Z) - Towards Target High-Utility Itemsets [2.824395407508717]
In applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases.
Targeted high-utility itemset mining has emerged as a key research topic.
We propose THUIM (Targeted High-Utility Itemset Mining), which can quickly match high-utility itemsets during the mining process to select the targeted patterns.
arXiv Detail & Related papers (2022-06-09T18:42:58Z) - TaSPM: Targeted Sequential Pattern Mining [53.234101208024335]
We propose a generic framework namely TaSPM, based on the fast CM-SPAM algorithm.
We also propose several pruning strategies to reduce meaningless operations in mining processes.
Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
arXiv Detail & Related papers (2022-02-26T17:49:47Z) - Capturing the temporal constraints of gradual patterns [0.0]
Gradual pattern mining allows for extraction of attribute correlations through gradual rules such as: "the more X, the more Y"
For instance, a researcher may apply gradual pattern mining to determine which attributes of a data set exhibit unfamiliar correlations in order to isolate them for deeper exploration or analysis.
This work is motivated by the proliferation of IoT applications in almost every area of our society.
arXiv Detail & Related papers (2021-06-28T06:45:48Z) - MCRapper: Monte-Carlo Rademacher Averages for Poset Families and
Approximate Pattern Mining [22.88915237311897]
We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA)
MCRapper computes both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset.
arXiv Detail & Related papers (2020-06-16T11:42:56Z) - Improving a State-of-the-Art Heuristic for the Minimum Latency Problem
with Data Mining [69.00394670035747]
Hybrid metaheuristics have become a trend in operations research.
A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques.
arXiv Detail & Related papers (2019-08-28T13:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.