DeVLBert: Learning Deconfounded Visio-Linguistic Representations
- URL: http://arxiv.org/abs/2008.06884v2
- Date: Fri, 2 Oct 2020 12:00:56 GMT
- Title: DeVLBert: Learning Deconfounded Visio-Linguistic Representations
- Authors: Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu,
Jin Yu, Hongxia Yang, Fei Wu
- Abstract summary: We investigate the problem of out-of-domain visio-linguistic pretraining.
Existing methods for this problem are purely likelihood-based.
We propose a Decon-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning.
- Score: 111.93480424791613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose to investigate the problem of out-of-domain
visio-linguistic pretraining, where the pretraining data distribution differs
from that of downstream data on which the pretrained model will be fine-tuned.
Existing methods for this problem are purely likelihood-based, leading to the
spurious correlations and hurt the generalization ability when transferred to
out-of-domain downstream tasks. By spurious correlation, we mean that the
conditional probability of one token (object or word) given another one can be
high (due to the dataset biases) without robust (causal) relationships between
them. To mitigate such dataset biases, we propose a Deconfounded
Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform
intervention-based learning. We borrow the idea of the backdoor adjustment from
the research field of causality and propose several neural-network based
architectures for Bert-style out-of-domain pretraining. The quantitative
results on three downstream tasks, Image Retrieval (IR), Zero-shot IR, and
Visual Question Answering, show the effectiveness of DeVLBert by boosting
generalization ability.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Incorporating Pre-training Data Matters in Unsupervised Domain
Adaptation [13.509286043322442]
Unsupervised domain adaptation (UDA) and Source-free UDA(SFUDA) methods formulate the problem involving two domains: source and target.
We investigate the correlation among ImageNet, the source, and the target domain.
We present a novel framework TriDA which preserves the semantic structure of the pre-train dataset during fine-tuning.
arXiv Detail & Related papers (2023-08-06T12:23:40Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Contrastive variational information bottleneck for aspect-based
sentiment analysis [36.83876224466177]
We propose to reduce spurious correlations for aspect-based sentiment analysis (ABSA) via a novel Contrastive Variational Information Bottleneck framework (called CVIB)
The proposed CVIB framework is composed of an original network and a self-pruned network, and these two networks are optimized simultaneously via contrastive learning.
Our approach achieves better performance than the strong competitors in terms of overall prediction performance, robustness, and generalization.
arXiv Detail & Related papers (2023-03-06T02:52:37Z) - Trading Information between Latents in Hierarchical Variational
Autoencoders [8.122270502556374]
Variational Autoencoders (VAEs) were originally motivated as probabilistic generative models in which one performs approximate Bayesian inference.
The proposal of $beta$-VAEs breaks this interpretation and generalizes VAEs to application domains beyond generative modeling.
We identify a general class of inference models for which one can split the rate into contributions from each layer, which can then be tuned independently.
arXiv Detail & Related papers (2023-02-09T18:56:11Z) - Fair Representation Learning using Interpolation Enabled Disentanglement [9.043741281011304]
We propose a novel method to address two key issues: (a) Can we simultaneously learn fair disentangled representations while ensuring the utility of the learned representation for downstream tasks, and (b)Can we provide theoretical insights into when the proposed approach will be both fair and accurate.
To address the former, we propose the method FRIED, Fair Representation learning using Interpolation Enabled Disentanglement.
arXiv Detail & Related papers (2021-07-31T17:32:12Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.