Evolving Knowledge Distillation with Large Language Models and Active
Learning
- URL: http://arxiv.org/abs/2403.06414v1
- Date: Mon, 11 Mar 2024 03:55:24 GMT
- Title: Evolving Knowledge Distillation with Large Language Models and Active
Learning
- Authors: Chengyuan Liu, Yangyang Kang, Fubang Zhao, Kun Kuang, Zhuoren Jiang,
Changlong Sun, Fei Wu
- Abstract summary: Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks.
Previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data.
We propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models.
- Score: 46.85430680828938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across
various NLP tasks. However, their computational costs are prohibitively high.
To address this issue, previous research has attempted to distill the knowledge
of LLMs into smaller models by generating annotated data. Nonetheless, these
works have mainly focused on the direct use of LLMs for text generation and
labeling, without fully exploring their potential to comprehend the target task
and acquire valuable knowledge. In this paper, we propose EvoKD: Evolving
Knowledge Distillation, which leverages the concept of active learning to
interactively enhance the process of data generation using large language
models, simultaneously improving the task capabilities of small domain model
(student model). Different from previous work, we actively analyze the student
model's weaknesses, and then synthesize labeled samples based on the analysis.
In addition, we provide iterative feedback to the LLMs regarding the student
model's performance to continuously construct diversified and challenging
samples. Experiments and analysis on different NLP tasks, namely, text
classification and named entity recognition show the effectiveness of EvoKD.
Related papers
- Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification [0.8232137862012223]
This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information.
To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors.
Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements.
arXiv Detail & Related papers (2024-03-18T18:08:44Z) - Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing.
We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing.
While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z) - LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named
Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task.
Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z) - ELAD: Explanation-Guided Large Language Models Active Distillation [16.243249111524403]
The deployment and application of Large Language Models (LLMs) is hindered by their memory inefficiency, computational demands, and the high costs of API inferences.
Traditional distillation methods, which transfer the capabilities of LLMs to smaller models, often fail to determine whether the knowledge has been sufficiently transferred.
We propose an Explanation-Guided LLMs Active Distillation (ELAD) framework that employs an active learning strategy to optimize the balance between annotation costs and model performance.
arXiv Detail & Related papers (2024-02-20T15:47:59Z) - From Text to Source: Results in Detecting Large Language Model-Generated Content [17.306542392779445]
Large Language Models (LLMs) are celebrated for their ability to generate human-like text.
This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training.
The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.
arXiv Detail & Related papers (2023-09-23T09:51:37Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.