A Unified Causal View of Instruction Tuning
- URL: http://arxiv.org/abs/2402.06220v1
- Date: Fri, 9 Feb 2024 07:12:56 GMT
- Title: A Unified Causal View of Instruction Tuning
- Authors: Lu Chen, Wei Huang, Ruqing Zhang, Wei Chen, Jiafeng Guo, Xueqi Cheng
- Abstract summary: We develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data.
Key idea is to learn task-required causal factors and only use those to make predictions for a given task.
- Score: 76.1000380429553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction tuning on a mixture of tasks has improved zero-shot capabilities
in natural language processing (NLP). Nevertheless, existing methods often
learn features that exhibit correlations between instruction-formatted samples
and target labels, rather than causal relationships. Termed as ``spurious
correlation'' in statistics, such a correlation may change drastically in a new
task, making the effect from the learned features to be misleading. To this
end, we develop a meta Structural Causal Model (meta-SCM) to integrate
different NLP tasks under a single causal structure of the data. Specifically,
the meta-SCM introduces multiple latent factors that represent properties of
source context, only some of which causally influence the target labels for a
specific task. The key idea is to learn task-required causal factors and only
use those to make predictions for a given task. Theoretically, we prove the
causal factor can be identified without mixing information from others. Guided
by the identifiability, we propose a Structural Instruction Tuning (SIT) method
to learn the task-required causal representations that can mimic the causal
factors for each task. The utility of our approach is verified by improvements
of zero-shot ability on a range of unseen datasets and tasks.
Related papers
- Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective [13.56923651751788]
We propose Causality-Guided Semantic Decoupling and Classification to mitigate the interference of task-irrelevant knowledge.
We employ the Dempster-Shafer evidence theory to evaluate the uncertainty of each prediction generated by diverse semantics.
arXiv Detail & Related papers (2024-10-01T09:33:45Z) - Exploring Correlations of Self-Supervised Tasks for Graphs [6.977921096191354]
This paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations.
We evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations.
We propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training.
arXiv Detail & Related papers (2024-05-07T12:02:23Z) - Hacking Task Confounder in Meta-Learning [18.179340061914708]
We propose a plug-and-play Meta-learning Causal Representation (MetaCRL) to eliminate task confounders.
Our work achieves state-of-the-art (SOTA) performance on benchmark datasets.
arXiv Detail & Related papers (2023-12-10T05:33:40Z) - Mind the instructions: a holistic evaluation of consistency and
interactions in prompt-based learning [14.569770617709073]
We present a detailed analysis of which design choices cause instabilities and inconsistencies in task predictions.
We show how spurious correlations between input distributions and labels form only a minor problem for prompted models.
We statistically analyse the results to show which factors are the most influential, interactive or stable.
arXiv Detail & Related papers (2023-10-20T13:25:24Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language
Understanding [51.31622274823167]
We propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks.
This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks.
arXiv Detail & Related papers (2022-08-19T02:46:20Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.