Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data
Classification
- URL: http://arxiv.org/abs/2305.16239v1
- Date: Thu, 25 May 2023 16:49:40 GMT
- Title: Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data
Classification
- Authors: Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei
- Abstract summary: We propose a semi-supervised method called persistent Laplacian-enhanced graph MBO (PL-MBO)
PL-MBO integrates persistent spectral graph theory with the classical Merriman-Bence- Osher scheme.
We evaluate the performance of the proposed method on data classification.
- Score: 2.8360662552057323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of many machine learning (ML) methods depends crucially on having
large amounts of labeled data. However, obtaining enough labeled data can be
expensive, time-consuming, and subject to ethical constraints for many
applications. One approach that has shown tremendous value in addressing this
challenge is semi-supervised learning (SSL); this technique utilizes both
labeled and unlabeled data during training, often with much less labeled data
than unlabeled data, which is often relatively easy and inexpensive to obtain.
In fact, SSL methods are particularly useful in applications where the cost of
labeling data is especially expensive, such as medical analysis, natural
language processing (NLP), or speech recognition. A subset of SSL methods that
have achieved great success in various domains involves algorithms that
integrate graph-based techniques. These procedures are popular due to the vast
amount of information provided by the graphical framework and the versatility
of their applications. In this work, we propose an algebraic topology-based
semi-supervised method called persistent Laplacian-enhanced graph MBO (PL-MBO)
by integrating persistent spectral graph theory with the classical
Merriman-Bence- Osher (MBO) scheme. Specifically, we use a filtration procedure
to generate a sequence of chain complexes and associated families of simplicial
complexes, from which we construct a family of persistent Laplacians. Overall,
it is a very efficient procedure that requires much less labeled data to
perform well compared to many ML techniques, and it can be adapted for both
small and large datasets. We evaluate the performance of the proposed method on
data classification, and the results indicate that the proposed technique
outperforms other existing semi-supervised algorithms.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Multiscale Laplacian Learning [3.24029503704305]
This paper presents two innovative multiscale Laplacian learning approaches for machine learning tasks.
The first approach, called multi Kernel manifold learning (MML), integrates manifold learning with multi Kernel information.
The second approach, called the multiscale MBO (MMBO) method, introduces multiscale Laplacians to a modification of the famous classical Merriman-Bence-Osher scheme.
arXiv Detail & Related papers (2021-09-08T15:25:32Z) - Relieving the Plateau: Active Semi-Supervised Learning for a Better
Landscape [2.3046646540823916]
Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts.
Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data.
We propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set.
arXiv Detail & Related papers (2021-04-08T06:03:59Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - PseudoSeg: Designing Pseudo Labels for Semantic Segmentation [78.35515004654553]
We present a re-design of pseudo-labeling to generate structured pseudo labels for training with unlabeled or weakly-labeled data.
We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and high-data regimes.
arXiv Detail & Related papers (2020-10-19T17:59:30Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Pseudo-Representation Labeling Semi-Supervised Learning [0.0]
In recent years, semi-supervised learning has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models.
This work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data.
Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world.
arXiv Detail & Related papers (2020-05-31T03:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.