Related papers: Clarify: Improving Model Robustness With Natural Language Corrections

Clarify: Improving Model Robustness With Natural Language Corrections

URL: http://arxiv.org/abs/2402.03715v1
Date: Tue, 6 Feb 2024 05:11:38 GMT
Title: Clarify: Improving Model Robustness With Natural Language Corrections
Authors: Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein, Chelsea Finn
Abstract summary: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions.
Score: 63.342630414000006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that targeted natural language feedback about a model's misconceptions is a more efficient form of additional supervision. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our user studies show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 17.1% in two datasets. Additionally, we use Clarify to find and rectify 31 novel hard subpopulations in the ImageNet dataset, improving minority-split accuracy from 21.1% to 28.7%.

Related papers

Machine Learning from Explanations [17.28638946021444]
We introduce an innovative approach for training reliable classification models on smaller datasets.<n>Our method centers around a two-stage training cycle that alternates between enhancing model prediction accuracy and refining its attention to match the explanations.<n>We demonstrate that our training cycle expedites the convergence towards more accurate and reliable models.
arXiv Detail & Related papers (2025-07-07T09:09:52Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Improving Classification Performance With Human Feedback: Label a few, we label the rest [2.7386128680964408]
This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision. We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models.
arXiv Detail & Related papers (2024-01-17T19:13:05Z)
Netflix and Forget: Efficient and Exact Machine Unlearning from Bi-linear Recommendations [15.789980605221672]
This paper focuses on simple but widely deployed bi-linear models for recommendations based on matrix completion. We develop Unlearn-ALS by making a few key modifications to the fine-tuning procedure under Alternating Least Squares. We show that Unlearn-ALS is consistent with retraining without emphany model degradation and exhibits rapid convergence.
arXiv Detail & Related papers (2023-02-13T20:27:45Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations. We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.