Integrating Statistical Significance and Discriminative Power in Pattern
Discovery
- URL: http://arxiv.org/abs/2401.12000v1
- Date: Mon, 22 Jan 2024 14:51:01 GMT
- Title: Integrating Statistical Significance and Discriminative Power in Pattern
Discovery
- Authors: Leonardo Alexandre and Rafael S. Costa and Rui Henriques
- Abstract summary: Proposed methodology integrates statistical significance and discriminative power criteria into state-of-the-art algorithms.
Tests show the role of the proposed methodology in discovering patterns with pronounced improvements of discriminative power and statistical significance without quality deterioration.
- Score: 2.1014808520898667
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pattern discovery plays a central role in both descriptive and predictive
tasks across multiple domains. Actionable patterns must meet rigorous
statistical significance criteria and, in the presence of target variables,
further uphold discriminative power. Our work addresses the underexplored area
of guiding pattern discovery by integrating statistical significance and
discriminative power criteria into state-of-the-art algorithms while preserving
pattern quality. We also address how pattern quality thresholds, imposed by
some algorithms, can be rectified to accommodate these additional criteria. To
test the proposed methodology, we select the triclustering task as the guiding
pattern discovery case and extend well-known greedy and multi-objective
optimization triclustering algorithms, $\delta$-Trimax and TriGen, that use
various pattern quality criteria, such as Mean Squared Residual (MSR), Least
Squared Lines (LSL), and Multi Slope Measure (MSL). Results from three case
studies show the role of the proposed methodology in discovering patterns with
pronounced improvements of discriminative power and statistical significance
without quality deterioration, highlighting its importance in supervisedly
guiding the search. Although the proposed methodology is motivated over
multivariate time series data, it can be straightforwardly extended to pattern
discovery tasks involving multivariate, N-way (N>3), transactional, and
sequential data structures.
Availability: The code is freely available at
https://github.com/JupitersMight/MOF_Triclustering under the MIT license.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective.
We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition.
Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z) - A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Efficient Discovery of Significant Patterns with Few-Shot Resampling [9.681286056736292]
In biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population.
A natural way to capture the association of a pattern with the target is to consider its statistical significance, assessing its deviation from the (null) hypothesis of independence between the pattern and the target.
We present FSR, an efficient algorithm to identify statistically significant patterns with rigorous guarantees on the probability of false discoveries.
arXiv Detail & Related papers (2024-06-17T17:49:27Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - A new algorithm for Subgroup Set Discovery based on Information Gain [58.720142291102135]
Information Gained Subgroup Discovery (IGSD) is a new SD algorithm for pattern discovery.
We compare IGSD with two state-of-the-art SD algorithms: FSSD and SSD++.
IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets.
arXiv Detail & Related papers (2023-07-26T21:42:34Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Task Agnostic and Post-hoc Unseen Distribution Detection [27.69612483621752]
We propose a task-agnostic and post-hoc Unseen Distribution Detection (TAPUDD) method.
It comprises of TAP-Mahalanobis, which clusters the training datasets' features and determines the minimum Mahalanobis distance of the test sample from all clusters.
We show that our method can detect unseen samples effectively across diverse tasks and performs better or on-par with the existing baselines.
arXiv Detail & Related papers (2022-07-26T17:55:15Z) - Supervised Multivariate Learning with Simultaneous Feature Auto-grouping
and Dimension Reduction [7.093830786026851]
This paper proposes a novel clustered reduced-rank learning framework.
It imposes two joint matrix regularizations to automatically group the features in constructing predictive factors.
It is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection.
arXiv Detail & Related papers (2021-12-17T20:11:20Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.