Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning
- URL: http://arxiv.org/abs/2512.16489v1
- Date: Thu, 18 Dec 2025 12:57:06 GMT
- Title: Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning
- Authors: Seyda Betul Aydin, Holger Brandt,
- Abstract summary: Generalizing causal knowledge across diverse environments is challenging.<n>Model-based estimators of individual treatment effects (ITE) from machine learning require large sample sizes.<n>We show how estimation of ITEs can be improved by leveraging knowledge from source datasets and adapting it to new settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generalizing causal knowledge across diverse environments is challenging, especially when estimates from large-scale datasets must be applied to smaller or systematically different contexts, where external validity is critical. Model-based estimators of individual treatment effects (ITE) from machine learning require large sample sizes, limiting their applicability in domains such as behavioral sciences with smaller datasets. We demonstrate how estimation of ITEs with Treatment Agnostic Representation Networks (TARNet; Shalit et al., 2017) can be improved by leveraging knowledge from source datasets and adapting it to new settings via transfer learning (TL-TARNet; Aloui et al., 2023). In simulations that vary source and sample sizes and consider both randomized and non-randomized intervention target settings, the transfer-learning extension TL-TARNet improves upon standard TARNet, reducing ITE error and attenuating bias when a large unbiased source is available and target samples are small. In an empirical application using the India Human Development Survey (IHDS-II), we estimate the effect of mothers' firewood collection time on children's weekly study time; transfer learning pulls the target mean ITEs toward the source ITE estimate, reducing bias in the estimates obtained without transfer. These results suggest that transfer learning for causal models can improve the estimation of ITE in small samples.
Related papers
- Automatic debiasing of neural networks via moment-constrained learning [0.0]
Naively learning the regression function and taking a sample mean of the target functional results in biased estimators.<n>We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in automatic debiasing.
arXiv Detail & Related papers (2024-09-29T20:56:54Z) - Improvement of Applicability in Student Performance Prediction Based on Transfer Learning [2.3290007848431955]
This study proposes a method to improve prediction accuracy by employing transfer learning techniques on the dataset with varying distributions.
The model was trained and evaluated to enhance its generalization ability and prediction accuracy.
Experiments demonstrated that this approach excels in reducing Root Mean Square Error (RMSE) and Mean Absolute Error (MAE)
The results demonstrate that freezing more layers improves performance for complex and noisy data, whereas freezing fewer layers is more effective for simpler and larger datasets.
arXiv Detail & Related papers (2024-06-01T13:09:05Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Maximizing Model Generalization for Machine Condition Monitoring with
Self-Supervised Learning and Federated Learning [4.214064911004321]
Deep Learning can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features.
Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to unseen target domains.
This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain.
arXiv Detail & Related papers (2023-04-27T17:57:54Z) - Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD)
Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information.
Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z) - Algorithms and Theory for Supervised Gradual Domain Adaptation [19.42476993856205]
We study the problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory.
Under this setting, we provide the first generalization upper bound on the learning error under mild assumptions.
Our results are algorithm agnostic for a range of loss functions, and only depend linearly on the averaged learning error across the trajectory.
arXiv Detail & Related papers (2022-04-25T13:26:11Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes [6.419457653976053]
We describe a transfer learning use case for a domain with a data-starved regime.
We evaluate the effectiveness of convolutional feature extraction and fine-tuning.
We conclude that transfer learning enhances the performance of CNN architectures in data-starved regimes.
arXiv Detail & Related papers (2020-02-29T18:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.