Related papers: Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models

Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models

URL: http://arxiv.org/abs/2504.20020v1
Date: Mon, 28 Apr 2025 17:42:02 GMT
Title: Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
Authors: Xin Wang, Haoyang Li, Zeyang Zhang, Haibo Chen, Wenwu Zhu,
Abstract summary: We introduce a novel learning paradigm -- Modular Machine Learning (MML) -- as an essential approach toward new-generation large language models (LLMs)<n>MML decomposes the complex structure of LLMs into three interdependent components: modular representation, modular model, and modular reasoning.<n>We present a feasible implementation of MML-based LLMs via leveraging advanced techniques such as disentangled representation learning, neural architecture search and neuro-symbolic learning.
Score: 45.05285463251872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have dramatically advanced machine learning research including natural language processing, computer vision, data mining, etc., yet they still exhibit critical limitations in reasoning, factual consistency, and interpretability. In this paper, we introduce a novel learning paradigm -- Modular Machine Learning (MML) -- as an essential approach toward new-generation LLMs. MML decomposes the complex structure of LLMs into three interdependent components: modular representation, modular model, and modular reasoning, aiming to enhance LLMs' capability of counterfactual reasoning, mitigating hallucinations, as well as promoting fairness, safety, and transparency. Specifically, the proposed MML paradigm can: i) clarify the internal working mechanism of LLMs through the disentanglement of semantic components; ii) allow for flexible and task-adaptive model design; iii) enable interpretable and logic-driven decision-making process. We present a feasible implementation of MML-based LLMs via leveraging advanced techniques such as disentangled representation learning, neural architecture search and neuro-symbolic learning. We critically identify key challenges, such as the integration of continuous neural and discrete symbolic processes, joint optimization, and computational scalability, present promising future research directions that deserve further exploration. Ultimately, the integration of the MML paradigm with LLMs has the potential to bridge the gap between statistical (deep) learning and formal (logical) reasoning, thereby paving the way for robust, adaptable, and trustworthy AI systems across a wide range of real-world applications.

Related papers

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models.<n>We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning.<n>Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models [40.67240575271987]
Large Language Models (LLMs) have revolutionized natural language processing, yet their internal mechanisms remain largely opaque.<n> mechanistic interpretability has attracted significant attention from the research community as a means to understand the inner workings of LLMs.<n>Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, superimposed features within LLMs into more interpretable components.
arXiv Detail & Related papers (2025-03-07T17:38:00Z)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [70.19607283302712]
We propose a novel framework to transfer knowledge from l-MLLM to s-MLLM. Specifically, we introduce Multimodal Distillation (MDist) to minimize the divergence between the visual-textual output distributions of l-MLLM and s-MLLM. We also propose a three-stage training scheme to fully exploit the potential of s-MLLM.
arXiv Detail & Related papers (2024-10-21T17:41:28Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks. We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models. We present four brick-oriented operations: retrieval and routing, merging, updating, and growing. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability. The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z)
Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing. We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z)
Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems [3.0309252269809264]
This paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences. We develop an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs) The paper presents a case study to demonstrate the feasibility of the developed IML framework for model predictive control-based precooling under demand response events in a virtual testbed.
arXiv Detail & Related papers (2024-02-14T21:19:33Z)
Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning [10.67134969207797]
Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies. Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potentially alleviate the difficulty of this process. We present SAGE, a general solution-oriented ABM generation framework designed for automatic modeling and generating solutions for targeted problems.
arXiv Detail & Related papers (2024-02-04T07:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.