Similarity Based Stratified Splitting: an approach to train better
classifiers
- URL: http://arxiv.org/abs/2010.06099v1
- Date: Tue, 13 Oct 2020 01:07:48 GMT
- Title: Similarity Based Stratified Splitting: an approach to train better
classifiers
- Authors: Felipe Farias, Teresa Ludermir, Carmelo Bastos-Filho
- Abstract summary: We propose a Similarity-Based Stratified Splitting technique, which uses both the output and input space information to split the data.
We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Similarity-Based Stratified Splitting (SBSS) technique, which
uses both the output and input space information to split the data. The splits
are generated using similarity functions among samples to place similar samples
in different splits. This approach allows for a better representation of the
data in the training phase. This strategy leads to a more realistic performance
estimation when used in real-world applications. We evaluate our proposal in
twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron,
Support Vector Machine, Random Forest and K-Nearest Neighbors, and five
similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean.
According to the Wilcoxon Sign-Rank test, our approach consistently
outperformed ordinary stratified 10-fold cross-validation in 75\% of the
assessed scenarios.
Related papers
- Measuring similarity between embedding spaces using induced neighborhood graphs [10.056989400384772]
We propose a metric to evaluate the similarity between paired item representations.
Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity.
arXiv Detail & Related papers (2024-11-13T15:22:33Z) - Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.
We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Retrieval-Augmented Classification with Decoupled Representation [31.662843145399044]
We propose a $k$-nearest-neighbor (KNN)-based method for retrieval augmented classifications.
We find that shared representation for classification and retrieval hurts performance and leads to training instability.
We evaluate our method on a wide range of classification datasets.
arXiv Detail & Related papers (2023-03-23T06:33:06Z) - Is it all a cluster game? -- Exploring Out-of-Distribution Detection
based on Clustering in the Embedding Space [7.856998585396422]
It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution.
We study the structure and separation of clusters in the embedding space and find that supervised contrastive learning leads to well-separated clusters.
In our analysis of different training methods, clustering strategies, distance metrics, and thresholding approaches, we observe that there is no clear winner.
arXiv Detail & Related papers (2022-03-16T11:22:23Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.