Can Large Language Models Learn Independent Causal Mechanisms?
- URL: http://arxiv.org/abs/2402.02636v1
- Date: Sun, 4 Feb 2024 23:04:02 GMT
- Title: Can Large Language Models Learn Independent Causal Mechanisms?
- Authors: Ga\"el Gendron, Bao Trung Nguyen, Alex Yuxuan Peng, Michael Witbrock,
Gillian Dobbie
- Abstract summary: Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts.
We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules.
We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks.
- Score: 9.950033005734165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite impressive performance on language modelling and complex reasoning
tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon
settings or with distribution shifts, exhibiting some lack of generalisation
ability. This issue has usually been alleviated by feeding more training data
into the LLM. However, this method is brittle, as the scope of tasks may not be
readily predictable or may evolve, and updating the model with new data
generally requires extensive additional training. By contrast, systems, such as
causal models, that learn abstract variables and causal relationships can
demonstrate increased robustness against changes in the distribution. One
reason for this success is the existence and use of Independent Causal
Mechanisms (ICMs) representing high-level concepts that only sparsely interact.
In this work, we apply two concepts from causality to learn ICMs within LLMs.
We develop a new LLM architecture composed of multiple sparsely interacting
language modelling modules. We introduce a routing scheme to induce
specialisation of the network into domain-specific modules. We also present a
Mutual Information minimisation objective that trains a separate module to
learn abstraction and domain-invariant mechanisms. We show that such causal
constraints can improve out-of-distribution performance on abstract and causal
reasoning tasks.
Related papers
- Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality.
This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.
This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.
We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z) - The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large
Language Models [20.177263185773153]
Multi-modal large language models (MLLMs) integrate verbal and visual information.
Despite the revolutionizing prospect of MLLMs, our understanding of their reasoning abilities is limited.
arXiv Detail & Related papers (2024-01-22T16:57:05Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.