Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels
- URL: http://arxiv.org/abs/2409.12425v1
- Date: Thu, 19 Sep 2024 02:59:44 GMT
- Title: Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels
- Authors: Chaoqun Liu, Qin Chao, Wenxuan Zhang, Xiaobao Wu, Boyang Li, Anh Tuan Luu, Lidong Bing,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels.
This study explores whether solely utilizing unlabeled data can elicit strong model capabilities.
We propose a new paradigm termed zero-to-strong generalization.
- Score: 75.77877889764073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. However, this paradigm is limited by the availability of gold labels, while in certain scenarios, LLMs may need to perform tasks that are too complex for humans to provide such labels. To tackle this challenge, this study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization. We iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering. Surprisingly, we obverse that this iterative process gradually unlocks LLMs' potential on downstream tasks. Our experiments on extensive classification and reasoning tasks confirm the effectiveness of our proposed framework. Our analysis indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes.
Related papers
- Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process.
We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency.
Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z) - On Unsupervised Prompt Learning for Classification with Black-box Language Models [71.60563181678323]
Large language models (LLMs) have achieved impressive success in text-formatted learning problems.
LLMs can label datasets with even better quality than skilled human annotators.
In this paper, we propose unsupervised prompt learning for classification with black-box LLMs.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - LLMs' Classification Performance is Overclaimed [14.141803470808824]
In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default.
This standard setup has traditionally highlighted the strong performance of advanced AI, particularly Large Language Models (LLMs)
This paper argues that the perceived performance of LLMs is overstated due to their inability to exhibit the expected comprehension of the task.
arXiv Detail & Related papers (2024-06-23T19:49:10Z) - TnT-LLM: Text Mining at Scale with Large Language Models [24.731544646232962]
Large Language Models (LLMs) automate the process of end-to-end label generation and assignment with minimal human effort.
We show that TnT-LLM generates more accurate and relevant label when compared against state-of-the-art baselines.
We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.
arXiv Detail & Related papers (2024-03-18T18:45:28Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - Label Supervised LLaMA Finetuning [13.939718306233617]
In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs)
We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss.
Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
arXiv Detail & Related papers (2023-10-02T13:53:03Z) - CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z) - GPT-NER: Named Entity Recognition via Large Language Models [58.609582116612934]
GPT-NER transforms the sequence labeling task to a generation task that can be easily adapted by Language Models.
We find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce.
This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.
arXiv Detail & Related papers (2023-04-20T16:17:26Z) - Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.