Related papers: Knockout: A simple way to handle missing inputs

Knockout: A simple way to handle missing inputs

URL: http://arxiv.org/abs/2405.20448v2
Date: Mon, 3 Jun 2024 14:40:28 GMT
Title: Knockout: A simple way to handle missing inputs
Authors: Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu,
Abstract summary: Models that leverage rich inputs can be difficult to deploy widely because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions.
Score: 8.05324050767023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance.

Related papers

Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others. We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data. Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z)
Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z)
Learn What Is Possible, Then Choose What Is Best: Disentangling One-To-Many Relations in Language Through Text-based Games [3.615981646205045]
We present an approach to train language models that can emulate the desirable behaviours, but not the undesirable ones. Using text-based games as a testbed, our approach, PASA, uses discrete latent variables to capture the range of different behaviours. Results show up to 49% empirical improvement over the previous state-of-the-art model.
arXiv Detail & Related papers (2023-04-14T17:11:26Z)
Explanation Shift: How Did the Distribution Shift Impact the Model? [23.403838118256907]
We study how explanation characteristics shift when affected by distribution shifts. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-03-14T17:13:01Z)
Task-Specific Skill Localization in Fine-tuned Language Models [36.53572616441048]
This paper introduces the term skill localization for this problem. A simple optimization is used to identify a very small subset of parameters. grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.
arXiv Detail & Related papers (2023-02-13T18:55:52Z)
PAMI: partition input and aggregate outputs for model interpretation [69.42924964776766]
In this study, a simple yet effective visualization framework called PAMI is proposed based on the observation that deep learning models often aggregate features from local regions for model predictions. The basic idea is to mask majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction. Extensive experiments on multiple tasks confirm the proposed method performs better than existing visualization approaches in more precisely finding class-specific input regions.
arXiv Detail & Related papers (2023-02-07T08:48:34Z)
Learning Instance-Specific Augmentations by Capturing Local Invariances [62.70897571389785]
InstaAug is a method for automatically learning input-specific augmentations from data. We empirically demonstrate that InstaAug learns meaningful input-dependent augmentations for a wide range of transformation classes.
arXiv Detail & Related papers (2022-05-31T18:38:06Z)
Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z)
What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets. We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
Estimating g-Leakage via Machine Learning [34.102705643128004]
This paper considers the problem of estimating the information leakage of a system in the black-box scenario. It is assumed that the system's internals are unknown to the learner, or anyway too complicated to analyze. We propose a novel approach to perform black-box estimation of the g-vulnerability using Machine Learning (ML) algorithms.
arXiv Detail & Related papers (2020-05-09T09:26:36Z)
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability. Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers. This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.