Transferable Deep Metric Learning for Clustering
- URL: http://arxiv.org/abs/2302.06523v1
- Date: Mon, 13 Feb 2023 17:09:59 GMT
- Title: Transferable Deep Metric Learning for Clustering
- Authors: Simo Alami.C, Rim Kaddah, Jesse Read
- Abstract summary: Clustering in high spaces is a difficult task; the usual dimension distance metrics may no longer be appropriate under the curse of dimensionality.
We show that we can learn a metric on a labelled dataset, then apply it to cluster a different dataset.
We achieve results competitive with the state-of-the-art while using only a small number of labelled training datasets and shallow networks.
- Score: 1.2762298148425795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering in high dimension spaces is a difficult task; the usual distance
metrics may no longer be appropriate under the curse of dimensionality. Indeed,
the choice of the metric is crucial, and it is highly dependent on the dataset
characteristics. However a single metric could be used to correctly perform
clustering on multiple datasets of different domains. We propose to do so,
providing a framework for learning a transferable metric. We show that we can
learn a metric on a labelled dataset, then apply it to cluster a different
dataset, using an embedding space that characterises a desired clustering in
the generic sense. We learn and test such metrics on several datasets of
variable complexity (synthetic, MNIST, SVHN, omniglot) and achieve results
competitive with the state-of-the-art while using only a small number of
labelled training datasets and shallow networks.
Related papers
- MNIST-Nd: a set of naturalistic datasets to benchmark clustering across dimensions [46.67219141114834]
We propose MNIST-Nd, a set of synthetic datasets that share a key property of real-world datasets.
MNIST-Nd is obtained by training mixture variational autoencoders with 2 to 64 latent dimensions on MNIST.
Preliminary common clustering algorithm benchmarks on MNIST-Nd suggest that Leiden is the most robust for growing dimensions.
arXiv Detail & Related papers (2024-10-21T15:51:30Z) - Can an unsupervised clustering algorithm reproduce a categorization system? [1.0485739694839669]
We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset.
We show that success depends on feature selection and the chosen distance metric.
arXiv Detail & Related papers (2024-08-19T18:27:14Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning [0.0]
We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity.
We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric.
arXiv Detail & Related papers (2023-06-02T19:51:48Z) - DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific
Clustering Using Side Information [0.0]
We present a novel approach, in which we learn to cluster data directly from side information.
We do not need to know the number of clusters, their centers or any kind of distance metric for similarity.
Our method is able to divide the same data points in various ways dependant on the needs of a specific task.
arXiv Detail & Related papers (2023-05-29T13:45:49Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - AutoGeoLabel: Automated Label Generation for Geospatial Machine Learning [69.47585818994959]
We evaluate a big data processing pipeline to auto-generate labels for remote sensing data.
We utilize the big geo-data platform IBM PAIRS to dynamically generate such labels in dense urban areas.
arXiv Detail & Related papers (2022-01-31T20:02:22Z) - Dominant Set-based Active Learning for Text Classification and its
Application to Online Social Media [0.0]
We present a novel pool-based active learning method for the training of large unlabeled corpus with minimum annotation cost.
Our proposed method does not have any parameters to be tuned, making it dataset-independent.
Our method achieves a higher performance in comparison to the state-of-the-art active learning strategies.
arXiv Detail & Related papers (2022-01-28T19:19:03Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.