Deep Low-Density Separation for Semi-Supervised Classification
- URL: http://arxiv.org/abs/2205.11995v1
- Date: Sun, 22 May 2022 11:00:55 GMT
- Title: Deep Low-Density Separation for Semi-Supervised Classification
- Authors: Michael C. Burkhart and Kyle Shan
- Abstract summary: We introduce a novel hybrid method that applies low-density separation to the embedded features.
Our approach effectively classifies thousands of unlabeled users from a relatively small number of hand-classified examples.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a small set of labeled data and a large set of unlabeled data,
semi-supervised learning (SSL) attempts to leverage the location of the
unlabeled datapoints in order to create a better classifier than could be
obtained from supervised methods applied to the labeled training set alone.
Effective SSL imposes structural assumptions on the data, e.g. that neighbors
are more likely to share a classification or that the decision boundary lies in
an area of low density. For complex and high-dimensional data, neural networks
can learn feature embeddings to which traditional SSL methods can then be
applied in what we call hybrid methods.
Previously-developed hybrid methods iterate between refining a latent
representation and performing graph-based SSL on this representation. In this
paper, we introduce a novel hybrid method that instead applies low-density
separation to the embedded features. We describe it in detail and discuss why
low-density separation may be better suited for SSL on neural network-based
embeddings than graph-based algorithms. We validate our method using in-house
customer survey data and compare it to other state-of-the-art learning methods.
Our approach effectively classifies thousands of unlabeled users from a
relatively small number of hand-classified examples.
Related papers
- Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data [6.812609988733991]
We study SSL for high dimensional Gaussian classification.
We analyze information theoretic lower bounds for accurate feature selection.
We present simulations that complement our theoretical analysis.
arXiv Detail & Related papers (2024-09-05T08:21:05Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical
Consistency for Efficient Semi-supervised Learning [60.57998388590556]
ProtoCon is a novel method for confidence-based pseudo-labeling.
Online nature of ProtoCon allows it to utilise the label history of the entire dataset in one training cycle.
It delivers significant gains and faster convergence over state-of-the-art datasets.
arXiv Detail & Related papers (2023-03-22T23:51:54Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - Self-Supervised Learning of Graph Neural Networks: A Unified Review [50.71341657322391]
Self-supervised learning is emerging as a new paradigm for making use of large amounts of unlabeled samples.
We provide a unified review of different ways of training graph neural networks (GNNs) using SSL.
Our treatment of SSL methods for GNNs sheds light on the similarities and differences of various methods, setting the stage for developing new methods and algorithms.
arXiv Detail & Related papers (2021-02-22T03:43:45Z) - Matching Distributions via Optimal Transport for Semi-Supervised
Learning [31.533832244923843]
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data.
We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures.
We have evaluated our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm.
arXiv Detail & Related papers (2020-12-04T11:15:14Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z) - Density-Aware Graph for Deep Semi-Supervised Visual Recognition [102.9484812869054]
Semi-supervised learning (SSL) has been extensively studied to improve the generalization ability of deep neural networks for visual recognition.
This paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged.
arXiv Detail & Related papers (2020-03-30T02:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.