Modular Deep Learning
- URL: http://arxiv.org/abs/2302.11529v2
- Date: Sat, 27 Jan 2024 12:01:57 GMT
- Title: Modular Deep Learning
- Authors: Jonas Pfeiffer, Sebastian Ruder, Ivan Vuli\'c, Edoardo Maria Ponti
- Abstract summary: Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
- Score: 120.36599591042908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning has recently become the dominant paradigm of machine
learning. Pre-trained models fine-tuned for downstream tasks achieve better
performance with fewer labelled examples. Nonetheless, it remains unclear how
to develop models that specialise towards multiple tasks without incurring
negative interference and that generalise systematically to non-identically
distributed tasks. Modular deep learning has emerged as a promising solution to
these challenges. In this framework, units of computation are often implemented
as autonomous parameter-efficient modules. Information is conditionally routed
to a subset of modules and subsequently aggregated. These properties enable
positive transfer and systematic generalisation by separating computation from
routing and updating modules locally. We offer a survey of modular
architectures, providing a unified view over several threads of research that
evolved independently in the scientific literature. Moreover, we explore
various additional purposes of modularity, including scaling language models,
causal inference, programme induction, and planning in reinforcement learning.
Finally, we report various concrete applications where modularity has been
successfully deployed such as cross-lingual and cross-modal knowledge transfer.
Related talks and projects to this survey, are available at
https://www.modulardeeplearning.com/.
Related papers
- Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Can Large Language Models Learn Independent Causal Mechanisms? [9.950033005734165]
Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts.
We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules.
We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks.
arXiv Detail & Related papers (2024-02-04T23:04:02Z) - Module-wise Adaptive Distillation for Multimodality Foundation Models [125.42414892566843]
multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes.
One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer.
Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently.
arXiv Detail & Related papers (2023-10-06T19:24:00Z) - Modularity in Deep Learning: A Survey [0.0]
We review the notion of modularity in deep learning around three axes: data, task, and model.
Data modularity refers to the observation or creation of data groups for various purposes.
Task modularity refers to the decomposition of tasks into sub-tasks.
Model modularity means that the architecture of a neural network system can be decomposed into identifiable modules.
arXiv Detail & Related papers (2023-10-02T12:41:34Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Is a Modular Architecture Enough? [80.32451720642209]
We provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions.
We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems.
arXiv Detail & Related papers (2022-06-06T16:12:06Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.