Underspecification Presents Challenges for Credibility in Modern Machine
Learning
- URL: http://arxiv.org/abs/2011.03395v2
- Date: Tue, 24 Nov 2020 19:16:02 GMT
- Title: Underspecification Presents Challenges for Credibility in Modern Machine
Learning
- Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak
Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein,
Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen
Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana
Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan,
Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory
Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini
Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve
Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley
- Abstract summary: Underspecification is common in modern ML pipelines, such as those based on deep learning.
We show here that such predictors can behave very differently in deployment domains.
This ambiguity can lead to instability and poor model behavior in practice.
- Score: 95.90009829265297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in
real-world domains. We identify underspecification as a key reason for these
failures. An ML pipeline is underspecified when it can return many predictors
with equivalently strong held-out performance in the training domain.
Underspecification is common in modern ML pipelines, such as those based on
deep learning. Predictors returned by underspecified pipelines are often
treated as equivalent based on their training domain performance, but we show
here that such predictors can behave very differently in deployment domains.
This ambiguity can lead to instability and poor model behavior in practice, and
is a distinct failure mode from previously identified issues arising from
structural mismatch between training and deployment domains. We show that this
problem appears in a wide variety of practical ML pipelines, using examples
from computer vision, medical imaging, natural language processing, clinical
risk prediction based on electronic health records, and medical genomics. Our
results show the need to explicitly account for underspecification in modeling
pipelines that are intended for real-world deployment in any domain.
Related papers
- Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning [26.29386609645171]
Fine-tuning of pre-trained language models (PLMs) has been shown to be effective across various domains.
We show that a robust representation can be derived through a so-called causal front-door adjustment, based on a decomposition assumption.
Our work thus sheds light on the domain generalization problem by introducing links between fine-tuning and causal mechanisms into representation learning.
arXiv Detail & Related papers (2024-10-18T11:06:23Z) - Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z) - DIGIC: Domain Generalizable Imitation Learning by Causal Discovery [69.13526582209165]
Causality has been combined with machine learning to produce robust representations for domain generalization.
We make a different attempt by leveraging the demonstration data distribution to discover causal features for a domain generalizable policy.
We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution.
arXiv Detail & Related papers (2024-02-29T07:09:01Z) - Robustness, Evaluation and Adaptation of Machine Learning Models in the
Wild [4.304803366354879]
We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models.
A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses.
arXiv Detail & Related papers (2023-03-05T21:41:16Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion.
We benchmark the performance on this task when looking at model accuracy on the few-shot examples.
We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z) - Cross-domain Imitation from Observations [50.669343548588294]
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior.
In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP.
We present a novel framework to learn correspondences across such domains.
arXiv Detail & Related papers (2021-05-20T21:08:25Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z) - Few-Shot Learning as Domain Adaptation: Algorithm and Analysis [120.75020271706978]
Few-shot learning uses prior knowledge learned from the seen classes to recognize the unseen classes.
This class-difference-caused distribution shift can be considered as a special case of domain shift.
We propose a prototypical domain adaptation network with attention (DAPNA) to explicitly tackle such a domain shift problem in a meta-learning framework.
arXiv Detail & Related papers (2020-02-06T01:04:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.