Related papers: When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

URL: http://arxiv.org/abs/2404.13028v1
Date: Fri, 19 Apr 2024 17:43:26 GMT
Title: When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering
Authors: Stephen Choi, William Gazeley,
Abstract summary: LLM-ADE is a methodology for continued pre-training of large language models. It addresses the challenges of catastrophic forgetting and double descent. It enhances model adaptability to new data while preserving previously acquired knowledge.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preserving previously acquired knowledge. We demonstrate LLM-ADE's effectiveness on the TinyLlama model across various general knowledge benchmarks, showing significant performance improvements without the drawbacks of traditional continuous training methods. This approach promises a more versatile and robust way to keep LLMs current and efficient in real-world applications.

Related papers

NAACL2025 Tutorial: Adaptation of Large Language Models [55.247657239126646]
This tutorial on adaptation of LLMs is designed to address the growing demand for models that go beyond the static capabilities of generic LLMs.<n>We categorize adaptation techniques into two main families. The first is parametric knowledge adaptation, which focuses on updating the parametric knowledge within LLMs.<n>The second kind of adaptation is semi-parametric knowledge adaptation, where the goal is to update LLM parameters to better leverage external knowledge or tools.
arXiv Detail & Related papers (2025-04-04T20:57:41Z)
From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
LLMs are Also Effective Embedding Models: An In-depth Overview [40.53941563464671]
Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks. Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs like GPT, LLaMA, and Mistral.
arXiv Detail & Related papers (2024-12-17T06:48:24Z)
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models [26.07431044262102]
This paper explores how model weights interact with unlearning processes in large language models (LLMs) We design the weight attribution-guided LLM unlearning method, WAGLE, which unveils the interconnections between 'influence' of weights and 'influence' of data to forget and retain.
arXiv Detail & Related papers (2024-10-23T02:22:07Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.