CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels
- URL: http://arxiv.org/abs/2306.02693v2
- Date: Fri, 9 Jun 2023 05:16:21 GMT
- Title: CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels
- Authors: Hyunsoo Cho, Youna Kim, Sang-goo Lee
- Abstract summary: Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
- Score: 14.285609493077965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Utilizing language models (LMs) without internal access is becoming an
attractive paradigm in the field of NLP as many cutting-edge LMs are released
through APIs and boast a massive scale. The de-facto method in this type of
black-box scenario is known as prompting, which has shown progressive
performance enhancements in situations where data labels are scarce or
unavailable. Despite their efficacy, they still fall short in comparison to
fully supervised counterparts and are generally brittle to slight
modifications. In this paper, we propose Clustering-enhanced Linear
Discriminative Analysis, a novel approach that improves the text classification
accuracy with a very weak-supervision signal (i.e., name of the labels). Our
framework draws a precise decision boundary without accessing weights or
gradients of the LM model or data labels. The core ideas of CELDA are twofold:
(1) extracting a refined pseudo-labeled dataset from an unlabeled dataset, and
(2) training a lightweight and robust model on the top of LM, which learns an
accurate decision boundary from an extracted noisy dataset. Throughout in-depth
investigations on various datasets, we demonstrated that CELDA reaches new
state-of-the-art in weakly-supervised text classification and narrows the gap
with a fully-supervised model. Additionally, our proposed methodology can be
applied universally to any LM and has the potential to scale to larger models,
making it a more viable option for utilizing large LMs.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process.
We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency.
Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z) - Permissive Information-Flow Analysis for Large Language Models [21.563132267220073]
Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems.
This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system.
We propose a novel, more permissive approach to propagate information flow labels through LLM queries.
arXiv Detail & Related papers (2024-10-04T00:25:43Z) - Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels.
This study explores whether solely utilizing unlabeled data can elicit strong model capabilities.
We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Data Augmentation For Label Enhancement [45.3351754830424]
Label enhancement (LE) has emerged to recover Label Distribution (LD) from logical label.
We propose a novel supervised LE dimensionality reduction approach, which projects the original data into a lower dimensional feature space.
The results show that our method consistently outperforms the other five comparing approaches.
arXiv Detail & Related papers (2023-03-21T09:36:58Z) - Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks.
We then tailor the labeling model specifically to the task of entity matching.
We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z) - L2B: Learning to Bootstrap Robust Models for Combating Label Noise [52.02335367411447]
This paper introduces a simple and effective method, named Learning to Bootstrap (L2B)
It enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels.
It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning.
arXiv Detail & Related papers (2022-02-09T05:57:08Z) - Training image classifiers using Semi-Weak Label Data [26.04162590798731]
In Multiple Instance learning (MIL), weak labels are provided at the bag level with only presence/absence information known.
This paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.
We propose a two-stage framework to address the problem of learning from semi-weak labels.
arXiv Detail & Related papers (2021-03-19T03:06:07Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.