Clarify: Improving Model Robustness With Natural Language Corrections
- URL: http://arxiv.org/abs/2402.03715v1
- Date: Tue, 6 Feb 2024 05:11:38 GMT
- Title: Clarify: Improving Model Robustness With Natural Language Corrections
- Authors: Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein,
Chelsea Finn
- Abstract summary: In supervised learning, models are trained to extract correlations from a static dataset.
This often leads to models that rely on high-level misconceptions.
We introduce Clarify, a novel interface and method for interactively correcting model misconceptions.
- Score: 63.342630414000006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In supervised learning, models are trained to extract correlations from a
static dataset. This often leads to models that rely on high-level
misconceptions. To prevent such misconceptions, we must necessarily provide
additional information beyond the training data. Existing methods incorporate
forms of additional instance-level supervision, such as labels for spurious
features or additional labeled data from a balanced distribution. Such
strategies can become prohibitively costly for large-scale datasets since they
require additional annotation at a scale close to the original training data.
We hypothesize that targeted natural language feedback about a model's
misconceptions is a more efficient form of additional supervision. We introduce
Clarify, a novel interface and method for interactively correcting model
misconceptions. Through Clarify, users need only provide a short text
description to describe a model's consistent failure patterns. Then, in an
entirely automated way, we use such descriptions to improve the training
process by reweighting the training data or gathering additional targeted data.
Our user studies show that non-expert users can successfully describe model
misconceptions via Clarify, improving worst-group accuracy by an average of
17.1% in two datasets. Additionally, we use Clarify to find and rectify 31
novel hard subpopulations in the ImageNet dataset, improving minority-split
accuracy from 21.1% to 28.7%.
Related papers
- Machine Learning from Explanations [17.28638946021444]
We introduce an innovative approach for training reliable classification models on smaller datasets.<n>Our method centers around a two-stage training cycle that alternates between enhancing model prediction accuracy and refining its attention to match the explanations.<n>We demonstrate that our training cycle expedites the convergence towards more accurate and reliable models.
arXiv Detail & Related papers (2025-07-07T09:09:52Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Improving Classification Performance With Human Feedback: Label a few,
we label the rest [2.7386128680964408]
This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision.
We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models.
arXiv Detail & Related papers (2024-01-17T19:13:05Z) - Netflix and Forget: Efficient and Exact Machine Unlearning from
Bi-linear Recommendations [15.789980605221672]
This paper focuses on simple but widely deployed bi-linear models for recommendations based on matrix completion.
We develop Unlearn-ALS by making a few key modifications to the fine-tuning procedure under Alternating Least Squares.
We show that Unlearn-ALS is consistent with retraining without emphany model degradation and exhibits rapid convergence.
arXiv Detail & Related papers (2023-02-13T20:27:45Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.