Configurable Foundation Models: Building LLMs from a Modular Perspective
- URL: http://arxiv.org/abs/2409.02877v1
- Date: Wed, 4 Sep 2024 17:01:02 GMT
- Title: Configurable Foundation Models: Building LLMs from a Modular Perspective
- Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun,
- Abstract summary: A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
- Score: 115.63847606634268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.
Related papers
- The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.
We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.
Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z) - Lifelong Learning of Large Language Model based Agents: A Roadmap [39.01532420650279]
Lifelong learning, also known as continual or incremental learning, is a crucial component for advancing Artificial General Intelligence (AGI)
This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into large language models (LLMs)
We highlight how these pillars collectively enable continuous adaptation, mitigate catastrophic forgetting, and improve long-term performance.
arXiv Detail & Related papers (2025-01-13T12:42:04Z) - A Survey on Large Language Models with some Insights on their Capabilities and Limitations [0.3222802562733786]
Large Language Models (LLMs) exhibit remarkable performance across various language-related tasks.
LLMs have demonstrated emergent abilities extending beyond their core functions.
This paper explores the foundational components, scaling mechanisms, and architectural strategies that drive these capabilities.
arXiv Detail & Related papers (2025-01-03T21:04:49Z) - APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents [8.479128275067742]
We present an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex structures in Minecraft.
By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints.
Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process.
arXiv Detail & Related papers (2024-11-26T09:31:28Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.