The Web Can Be Your Oyster for Improving Large Language Models
- URL: http://arxiv.org/abs/2305.10998v2
- Date: Wed, 24 May 2023 09:35:39 GMT
- Title: The Web Can Be Your Oyster for Improving Large Language Models
- Authors: Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jingyuan Wang, Jian-Yun Nie and
Ji-Rong Wen
- Abstract summary: Large language models (LLMs) encode a large amount of world knowledge.
We consider augmenting LLMs with the large-scale web using search engine.
We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
- Score: 98.72358969495835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) encode a large amount of world knowledge.
However, as such knowledge is frozen at the time of model training, the models
become static and limited by the training data at that time. In order to
further improve the capacity of LLMs for knowledge-intensive tasks, we consider
augmenting LLMs with the large-scale web using search engine. Unlike previous
augmentation sources (e.g., Wikipedia data dump), the web provides broader,
more comprehensive and constantly updated information. In this paper, we
present a web-augmented LLM UNIWEB, which is trained over 16
knowledge-intensive tasks in a unified text-to-text format. Instead of simply
using the retrieved contents from web, our approach has made two major
improvements. Firstly, we propose an adaptive search engine assisted learning
method that can self-evaluate the confidence level of LLM's predictions, and
adaptively determine when to refer to the web for more data, which can avoid
useless or noisy augmentation from web. Secondly, we design a pretraining task,
i.e., continual knowledge learning, based on salient spans prediction, to
reduce the discrepancy between the encoded and retrieved knowledge. Experiments
on a wide range of knowledge-intensive tasks show that our model significantly
outperforms previous retrieval-augmented methods.
Related papers
- Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Large Language Model (LLM) pretraining traditionally relies on autoregressive language modeling on randomly sampled data blocks from web-scale datasets.
We take inspiration from human learning techniques like spaced repetition to hypothesize that random data sampling for LLMs leads to high training cost and low quality models which tend to forget data.
In order to effectively commit web-scale information to long-term memory, we propose the LFR (Learn, Focus, and Review) pedagogy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge.
The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes.
We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z) - Leveraging Large Language Models for Web Scraping [0.0]
This research investigates a general-purpose accurate data scraping recipe for RAG models designed for language generation.
To capture knowledge in a more modular and interpretable way, we use pre trained language models with a latent knowledge retriever.
arXiv Detail & Related papers (2024-06-12T14:15:15Z) - GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? [36.987716816134984]
We propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates.
Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated.
We introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval.
arXiv Detail & Related papers (2024-06-09T01:16:04Z) - Large Scale Knowledge Washing [24.533316191149677]
Large language models show impressive abilities in memorizing world knowledge.
We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge.
arXiv Detail & Related papers (2024-05-26T23:29:49Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - Knowledge Efficient Deep Learning for Natural Language Processing [2.2701338128113124]
This thesis focuses on adapting classical methods to modern deep learning models and algorithms.
First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models.
Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision.
arXiv Detail & Related papers (2020-08-28T23:32:33Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.