Discovering modular solutions that generalize compositionally
- URL: http://arxiv.org/abs/2312.15001v2
- Date: Mon, 25 Mar 2024 17:01:08 GMT
- Title: Discovering modular solutions that generalize compositionally
- Authors: Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger,
- Abstract summary: We show that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations.
We further demonstrate empirically that meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
- Score: 55.46688816816882
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which circumstances modular systems can discover hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. In particular we study modularity in hypernetworks representing a general class of multiplicative interactions. We show theoretically that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that under the theoretically identified conditions, meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
Related papers
- Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - Once and for all: how to compose modules -- The composition calculus [1.4372498385359374]
In a technical framework, interaction requires composition of modules.
We suggest a minimal set of postulates to characterize systems in the digital world that consist of interacting modules.
This claim is supported by a rich body of theorems, properties, special classes of modules, and case studies.
arXiv Detail & Related papers (2024-08-27T13:01:04Z) - Grokking Modular Polynomials [5.358878931933351]
We extend the class of analytical solutions to include modular multiplication as well as modular addition with many terms.
We show that real networks trained on these datasets learn similar solutions upon generalization (grokking)
We hypothesize a classification of modulars into learnable and non-learnable via neural networks training.
arXiv Detail & Related papers (2024-06-05T17:59:35Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials [29.09237503747052]
Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers.
We show that the transferability among the models grokked with each operation can be only limited to specific combinations.
Some multi-task mixtures may lead to co-grokking, where grokking simultaneously happens for all the tasks.
arXiv Detail & Related papers (2024-02-26T16:48:12Z) - Modularity in Deep Learning: A Survey [0.0]
We review the notion of modularity in deep learning around three axes: data, task, and model.
Data modularity refers to the observation or creation of data groups for various purposes.
Task modularity refers to the decomposition of tasks into sub-tasks.
Model modularity means that the architecture of a neural network system can be decomposed into identifiable modules.
arXiv Detail & Related papers (2023-10-02T12:41:34Z) - Dynamic MOdularized Reasoning for Compositional Structured Explanation
Generation [29.16040150962427]
We propose a dynamic modularized reasoning model, MORSE, to improve compositional generalization of neural models.
MORSE factorizes the inference process into a combination of modules, where each module represents a functional unit.
We conduct experiments for increasing lengths and shapes of reasoning trees on two benchmarks to test MORSE's compositional generalization abilities.
arXiv Detail & Related papers (2023-09-14T11:40:30Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Disentangling Reasoning Capabilities from Language Models with
Compositional Reasoning Transformers [72.04044221898059]
ReasonFormer is a unified reasoning framework for mirroring the modular and compositional reasoning process of humans.
The representation module (automatic thinking) and reasoning modules (controlled thinking) are disentangled to capture different levels of cognition.
The unified reasoning framework solves multiple tasks with a single model,and is trained and inferred in an end-to-end manner.
arXiv Detail & Related papers (2022-10-20T13:39:55Z) - Is a Modular Architecture Enough? [80.32451720642209]
We provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions.
We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems.
arXiv Detail & Related papers (2022-06-06T16:12:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.