Supervised Knowledge Makes Large Language Models Better In-context Learners
- URL: http://arxiv.org/abs/2312.15918v2
- Date: Thu, 11 Apr 2024 06:41:15 GMT
- Title: Supervised Knowledge Makes Large Language Models Better In-context Learners
- Authors: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang,
- Abstract summary: Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
- Score: 94.89301696512776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. The code and data are released at: https://github.com/YangLinyi/Supervised-Knowledge-Makes-Large-Language-Models-Better-In-context-Lear ners. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs.
Related papers
- Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.
For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.
We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Federated In-Context LLM Agent Learning [3.4757641432843487]
Large Language Models (LLMs) have revolutionized intelligent services by enabling logical reasoning, tool use, and interaction with external systems as agents.
In this paper, we propose a novel privacy-preserving Federated In-context LLM Agent Learning (FICAL) algorithm.
The results show that FICAL has competitive performance compared to other SOTA baselines with a significant communication cost decrease of $mathbf3.33times105$ times.
arXiv Detail & Related papers (2024-12-11T03:00:24Z) - Leveraging Open-Source Large Language Models for Native Language Identification [1.6267479602370543]
Native Language Identification (NLI) has applications in forensics, marketing, and second language acquisition.
This study explores the potential of using open-source generative large language models (LLMs) for NLI.
arXiv Detail & Related papers (2024-09-15T08:14:18Z) - Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience [4.814313782484443]
Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks.
We use LLMs to generate domain-specific data, thereby improving the performance of pre-trained LMs on the target tasks.
arXiv Detail & Related papers (2024-08-16T06:37:59Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs
for Fact-aware Language Modeling [34.59678835272862]
ChatGPT, a representative large language model (LLM), has gained considerable attention due to its powerful emergent abilities.
This paper proposes to enhance LLMs with knowledge graph-enhanced large language models (KGLLMs)
KGLLM provides a solution to enhance LLMs' factual reasoning ability, opening up new avenues for LLM research.
arXiv Detail & Related papers (2023-06-20T12:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.