Contrast Pattern Mining: A Survey
- URL: http://arxiv.org/abs/2209.13556v1
- Date: Tue, 27 Sep 2022 17:11:12 GMT
- Title: Contrast Pattern Mining: A Survey
- Authors: Yao Chen, Wensheng Gan, Yongdong Wu, and Philip S. Yu
- Abstract summary: It is difficult for new researchers in the field to understand the general situation of the field in a short period of time.
First, we present an in-depth understanding of CPM, including basic concepts, types, mining strategies, and metrics for assessing discriminative ability.
We classify CPM methods according to their characteristics into boundary-based algorithms, tree-based algorithms, evolutionary fuzzy system-based algorithms, decision tree-based algorithms, and other algorithms.
- Score: 54.06874773607785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrast pattern mining (CPM) is an important and popular subfield of data
mining. Traditional sequential patterns cannot describe the contrast
information between different classes of data, while contrast patterns
involving the concept of contrast can describe the significant differences
between datasets under different contrast conditions. Based on the number of
papers published in this field, we find that researchers' interest in CPM is
still active. Since CPM has many research questions and research methods. It is
difficult for new researchers in the field to understand the general situation
of the field in a short period of time. Therefore, the purpose of this article
is to provide an up-to-date comprehensive and structured overview of the
research direction of contrast pattern mining. First, we present an in-depth
understanding of CPM, including basic concepts, types, mining strategies, and
metrics for assessing discriminative ability. Then we classify CPM methods
according to their characteristics into boundary-based algorithms, tree-based
algorithms, evolutionary fuzzy system-based algorithms, decision tree-based
algorithms, and other algorithms. In addition, we list the classical algorithms
of these methods and discuss their advantages and disadvantages. Advanced
topics in CPM are presented. Finally, we conclude our survey with a discussion
of the challenges and opportunities in this field.
Related papers
- A Rapid Review of Clustering Algorithms [5.46715422237599]
Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data.
They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media.
We analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions.
arXiv Detail & Related papers (2024-01-14T23:19:53Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Multivariate Time Series Anomaly Detection: Fancy Algorithms and Flawed
Evaluation Methodology [2.043517674271996]
We discuss how a normally good protocol may have weaknesses in the context of MVTS anomaly detection.
We propose a simple, yet challenging, baseline based on Principal Components Analysis (PCA) that surprisingly outperforms many recent Deep Learning (DL) based approaches on popular benchmark datasets.
arXiv Detail & Related papers (2023-08-24T20:24:12Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Clustering with minimum spanning trees: How good can it be? [1.9999259391104391]
We quantify the extent to which minimum spanning trees are meaningful in low-dimensional partitional data clustering tasks.
We review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes.
Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms.
arXiv Detail & Related papers (2023-03-10T03:18:03Z) - A survey of Bayesian Network structure learning [8.411014222942168]
This paper provides a review of 61 algorithms proposed for learning BN structure from data.
The basic approach of each algorithm is described in consistent terms, and the similarities and differences between them highlighted.
Approaches for dealing with data noise in real-world datasets and incorporating expert knowledge into the learning process are also covered.
arXiv Detail & Related papers (2021-09-23T14:54:00Z) - Explaining Algorithmic Fairness Through Fairness-Aware Causal Path
Decomposition [37.823248189626014]
We propose to study the problem of identification of the source of model disparities.
Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables.
Our framework is also model agnostic and applicable to a variety of quantitative disparity measures.
arXiv Detail & Related papers (2021-08-11T17:23:47Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.