Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning
- URL: http://arxiv.org/abs/2410.14375v1
- Date: Fri, 18 Oct 2024 11:06:23 GMT
- Title: Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning
- Authors: Jialin Yu, Yuxiang Zhou, Yulan He, Nevin L. Zhang, Ricardo Silva,
- Abstract summary: Fine-tuning of pre-trained language models (PLMs) has been shown to be effective across various domains.
We show that a robust representation can be derived through a so-called causal front-door adjustment, based on a decomposition assumption.
Our work thus sheds light on the domain generalization problem by introducing links between fine-tuning and causal mechanisms into representation learning.
- Score: 26.29386609645171
- License:
- Abstract: The fine-tuning of pre-trained language models (PLMs) has been shown to be effective across various domains. By using domain-specific supervised data, the general-purpose representation derived from PLMs can be transformed into a domain-specific representation. However, these methods often fail to generalize to out-of-domain (OOD) data due to their reliance on non-causal representations, often described as spurious features. Existing methods either make use of adjustments with strong assumptions about lack of hidden common causes, or mitigate the effect of spurious features using multi-domain data. In this work, we investigate how fine-tuned pre-trained language models aid generalizability from single-domain scenarios under mild assumptions, targeting more general and practical real-world scenarios. We show that a robust representation can be derived through a so-called causal front-door adjustment, based on a decomposition assumption, using fine-tuned representations as a source of data augmentation. Comprehensive experiments in both synthetic and real-world settings demonstrate the superior generalizability of the proposed method compared to existing approaches. Our work thus sheds light on the domain generalization problem by introducing links between fine-tuning and causal mechanisms into representation learning.
Related papers
- Causal Representation-Based Domain Generalization on Gaze Estimation [10.283904882611463]
We propose the Causal Representation-Based Domain Generalization on Gaze Estimation framework.
We employ an adversarial training manner and an additional penalizing term to extract domain-invariant features.
By leveraging these modules, CauGE ensures that the neural networks learn from representations that meet the causal mechanisms' general principles.
arXiv Detail & Related papers (2024-08-30T01:45:22Z) - Causally Inspired Regularization Enables Domain General Representations [14.036422506623383]
Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations.
We propose a novel framework with regularizations, which we demonstrate are sufficient for identifying domain-general feature representations without a priori knowledge (or proxies) of the spurious features.
Our proposed method is effective for both (semi) synthetic and real-world data, outperforming other state-of-the-art methods in average and worst-domain transfer accuracy.
arXiv Detail & Related papers (2024-04-25T01:33:55Z) - DIGIC: Domain Generalizable Imitation Learning by Causal Discovery [69.13526582209165]
Causality has been combined with machine learning to produce robust representations for domain generalization.
We make a different attempt by leveraging the demonstration data distribution to discover causal features for a domain generalizable policy.
We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution.
arXiv Detail & Related papers (2024-02-29T07:09:01Z) - Normalization Perturbation: A Simple Domain Generalization Method for
Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises.
Deep models only know the training domain style.
We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z) - GCISG: Guided Causal Invariant Learning for Improved Syn-to-real
Generalization [1.2215956380648065]
Training a deep learning model with artificially generated data can be an alternative when training data are scarce.
In this paper, we characterize the domain gap by using a causal framework for data generation.
We propose causal invariance learning which encourages the model to learn a style-invariant representation that enhances the syn-to-real generalization.
arXiv Detail & Related papers (2022-08-22T02:39:05Z) - Self-balanced Learning For Domain Generalization [64.99791119112503]
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class.
We propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data.
arXiv Detail & Related papers (2021-08-31T03:17:54Z) - Posterior Differential Regularization with f-divergence for Improving
Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs.
We generalize the posterior differential regularization to the family of $f$-divergences.
Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Learning to Learn with Variational Information Bottleneck for Domain
Generalization [128.90691697063616]
Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift.
We introduce a probabilistic meta-learning model for domain generalization, in which parameters shared across domains are modeled as distributions.
To deal with domain shift, we learn domain-invariant representations by the proposed principle of meta variational information bottleneck, we call MetaVIB.
arXiv Detail & Related papers (2020-07-15T12:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.