Task Agnostic and Post-hoc Unseen Distribution Detection
- URL: http://arxiv.org/abs/2207.13083v1
- Date: Tue, 26 Jul 2022 17:55:15 GMT
- Title: Task Agnostic and Post-hoc Unseen Distribution Detection
- Authors: Radhika Dua, Seongjun Yang, Yixuan Li, Edward Choi
- Abstract summary: We propose a task-agnostic and post-hoc Unseen Distribution Detection (TAPUDD) method.
It comprises of TAP-Mahalanobis, which clusters the training datasets' features and determines the minimum Mahalanobis distance of the test sample from all clusters.
We show that our method can detect unseen samples effectively across diverse tasks and performs better or on-par with the existing baselines.
- Score: 27.69612483621752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent advances in out-of-distribution(OOD) detection, anomaly
detection, and uncertainty estimation tasks, there do not exist a task-agnostic
and post-hoc approach. To address this limitation, we design a novel
clustering-based ensembling method, called Task Agnostic and Post-hoc Unseen
Distribution Detection (TAPUDD) that utilizes the features extracted from the
model trained on a specific task. Explicitly, it comprises of TAP-Mahalanobis,
which clusters the training datasets' features and determines the minimum
Mahalanobis distance of the test sample from all clusters. Further, we propose
the Ensembling module that aggregates the computation of iterative
TAP-Mahalanobis for a different number of clusters to provide reliable and
efficient cluster computation. Through extensive experiments on synthetic and
real-world datasets, we observe that our approach can detect unseen samples
effectively across diverse tasks and performs better or on-par with the
existing baselines. To this end, we eliminate the necessity of determining the
optimal value of the number of clusters and demonstrate that our method is more
viable for large-scale classification tasks.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Mitigating shortage of labeled data using clustering-based active
learning with diversity exploration [3.312798619476657]
We propose a clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling.
A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes.
arXiv Detail & Related papers (2022-07-06T20:53:28Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Deep Distribution-preserving Incomplete Clustering with Optimal
Transport [43.0056459311929]
We propose a novel deep incomplete clustering method, named Deep Distribution-preserving Incomplete Clustering with Optimal Transport (DDIC-OT)
The proposed network achieves superior and stable clustering performance improvement against existing state-of-the-art incomplete clustering methods over different missing ratios.
arXiv Detail & Related papers (2021-03-21T15:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.