Related papers: Integrating Statistical Significance and Discriminative Power in Pattern Discovery

Integrating Statistical Significance and Discriminative Power in Pattern Discovery

URL: http://arxiv.org/abs/2401.12000v1
Date: Mon, 22 Jan 2024 14:51:01 GMT
Title: Integrating Statistical Significance and Discriminative Power in Pattern Discovery
Authors: Leonardo Alexandre and Rafael S. Costa and Rui Henriques
Abstract summary: Proposed methodology integrates statistical significance and discriminative power criteria into state-of-the-art algorithms. Tests show the role of the proposed methodology in discovering patterns with pronounced improvements of discriminative power and statistical significance without quality deterioration.
Score: 2.1014808520898667
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pattern discovery plays a central role in both descriptive and predictive tasks across multiple domains. Actionable patterns must meet rigorous statistical significance criteria and, in the presence of target variables, further uphold discriminative power. Our work addresses the underexplored area of guiding pattern discovery by integrating statistical significance and discriminative power criteria into state-of-the-art algorithms while preserving pattern quality. We also address how pattern quality thresholds, imposed by some algorithms, can be rectified to accommodate these additional criteria. To test the proposed methodology, we select the triclustering task as the guiding pattern discovery case and extend well-known greedy and multi-objective optimization triclustering algorithms, $\delta$-Trimax and TriGen, that use various pattern quality criteria, such as Mean Squared Residual (MSR), Least Squared Lines (LSL), and Multi Slope Measure (MSL). Results from three case studies show the role of the proposed methodology in discovering patterns with pronounced improvements of discriminative power and statistical significance without quality deterioration, highlighting its importance in supervisedly guiding the search. Although the proposed methodology is motivated over multivariate time series data, it can be straightforwardly extended to pattern discovery tasks involving multivariate, N-way (N>3), transactional, and sequential data structures. Availability: The code is freely available at https://github.com/JupitersMight/MOF_Triclustering under the MIT license.

Related papers

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces [0.7864304771129751]
We introduce the first k-means variant in the literature that operates within a probabilistic metric space. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework. Our proposed random normed k-means (RNKM) algorithm exhibits a remarkable ability to identify nonlinearly separable structures.
arXiv Detail & Related papers (2025-04-04T20:48:43Z)
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series [0.01874930567916036]
Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools. Our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature.
arXiv Detail & Related papers (2024-11-21T09:03:12Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective. We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition. Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z)
A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images. We validated our approach on a large hyperspectral dataset called TAIGA. A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z)
Efficient Discovery of Significant Patterns with Few-Shot Resampling [9.681286056736292]
In biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population. A natural way to capture the association of a pattern with the target is to consider its statistical significance, assessing its deviation from the (null) hypothesis of independence between the pattern and the target. We present FSR, an efficient algorithm to identify statistically significant patterns with rigorous guarantees on the probability of false discoveries.
arXiv Detail & Related papers (2024-06-17T17:49:27Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
A new algorithm for Subgroup Set Discovery based on Information Gain [58.720142291102135]
Information Gained Subgroup Discovery (IGSD) is a new SD algorithm for pattern discovery. We compare IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets.
arXiv Detail & Related papers (2023-07-26T21:42:34Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Task Agnostic and Post-hoc Unseen Distribution Detection [27.69612483621752]
We propose a task-agnostic and post-hoc Unseen Distribution Detection (TAPUDD) method. It comprises of TAP-Mahalanobis, which clusters the training datasets' features and determines the minimum Mahalanobis distance of the test sample from all clusters. We show that our method can detect unseen samples effectively across diverse tasks and performs better or on-par with the existing baselines.
arXiv Detail & Related papers (2022-07-26T17:55:15Z)
Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction [7.093830786026851]
This paper proposes a novel clustered reduced-rank learning framework. It imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. It is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection.
arXiv Detail & Related papers (2021-12-17T20:11:20Z)
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments. It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics. We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.