Related papers: Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data Classification

Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data Classification

URL: http://arxiv.org/abs/2305.16239v1
Date: Thu, 25 May 2023 16:49:40 GMT
Title: Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data Classification
Authors: Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei
Abstract summary: We propose a semi-supervised method called persistent Laplacian-enhanced graph MBO (PL-MBO) PL-MBO integrates persistent spectral graph theory with the classical Merriman-Bence- Osher scheme. We evaluate the performance of the proposed method on data classification.
Score: 2.8360662552057323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing (NLP), or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework and the versatility of their applications. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO (PL-MBO) by integrating persistent spectral graph theory with the classical Merriman-Bence- Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of the proposed method on data classification, and the results indicate that the proposed technique outperforms other existing semi-supervised algorithms.

Related papers

BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs [18.281284442275457]
We introduce BEACON: a benchmark designed to rigorously evaluate both algorithmic (AL) and machine learning (ML) subgraph counting methods. BEACON provides a standardized dataset with verified ground truths, an integrated evaluation environment, and a public leaderboard. Our experiments reveal that while AL methods excel in efficiently counting subgraphs on very large graphs, they struggle with complex patterns.
arXiv Detail & Related papers (2025-04-15T07:53:47Z)
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student. Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Multiscale Laplacian Learning [3.24029503704305]
This paper presents two innovative multiscale Laplacian learning approaches for machine learning tasks. The first approach, called multi Kernel manifold learning (MML), integrates manifold learning with multi Kernel information. The second approach, called the multiscale MBO (MMBO) method, introduces multiscale Laplacians to a modification of the famous classical Merriman-Bence-Osher scheme.
arXiv Detail & Related papers (2021-09-08T15:25:32Z)
Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape [2.3046646540823916]
Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts. Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data. We propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set.
arXiv Detail & Related papers (2021-04-08T06:03:59Z)
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
PseudoSeg: Designing Pseudo Labels for Semantic Segmentation [78.35515004654553]
We present a re-design of pseudo-labeling to generate structured pseudo labels for training with unlabeled or weakly-labeled data. We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and high-data regimes.
arXiv Detail & Related papers (2020-10-19T17:59:30Z)
Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models. Self-training serves as an effective mechanism to learn from large amounts of unlabeled data. meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
Pseudo-Representation Labeling Semi-Supervised Learning [0.0]
In recent years, semi-supervised learning has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models. This work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data. Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world.
arXiv Detail & Related papers (2020-05-31T03:55:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.