Confidence-Based Model Selection: When to Take Shortcuts for
Subpopulation Shifts
- URL: http://arxiv.org/abs/2306.11120v1
- Date: Mon, 19 Jun 2023 18:48:15 GMT
- Title: Confidence-Based Model Selection: When to Take Shortcuts for
Subpopulation Shifts
- Authors: Annie S. Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn
- Abstract summary: We propose COnfidence-baSed MOdel Selection (CosMoS), where model confidence can effectively guide model selection.
We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift.
- Score: 119.22672589020394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective machine learning models learn both robust features that directly
determine the outcome of interest (e.g., an object with wheels is more likely
to be a car), and shortcut features (e.g., an object on a road is more likely
to be a car). The latter can be a source of error under distributional shift,
when the correlations change at test-time. The prevailing sentiment in the
robustness literature is to avoid such correlative shortcut features and learn
robust predictors. However, while robust predictors perform better on
worst-case distributional shifts, they often sacrifice accuracy on majority
subpopulations. In this paper, we argue that shortcut features should not be
entirely discarded. Instead, if we can identify the subpopulation to which an
input belongs, we can adaptively choose among models with different strengths
to achieve high performance on both majority and minority subpopulations. We
propose COnfidence-baSed MOdel Selection (CosMoS), where we observe that model
confidence can effectively guide model selection. Notably, CosMoS does not
require any target labels or group annotations, either of which may be
difficult to obtain or unavailable. We evaluate CosMoS on four datasets with
spurious correlations, each with multiple test sets with varying levels of data
distribution shift. We find that CosMoS achieves 2-5% lower average regret
across all subpopulations, compared to using only robust predictors or other
model aggregation methods.
Related papers
- A Complete Decomposition of KL Error using Refined Information and Mode Interaction Selection [11.994525728378603]
We revisit the classical formulation of the log-linear model with a focus on higher-order mode interactions.
We find that our learned distributions are able to more efficiently use the finite amount of data which is available in practice.
arXiv Detail & Related papers (2024-10-15T18:08:32Z) - Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models [0.0]
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models.
We present a sequential selection method that identifies critically important information within a dataset.
We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails.
arXiv Detail & Related papers (2022-08-27T19:43:53Z) - Sharing pattern submodels for prediction with missing values [12.981974894538668]
Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time.
We propose an alternative approach, called sharing pattern submodels, which i) makes predictions robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels andiii) has a short description, enabling improved interpretability.
arXiv Detail & Related papers (2022-06-22T15:09:40Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - How do Decisions Emerge across Layers in Neural Models? Interpretation
with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability.
Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers.
This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.