Related papers: Estimating Fr\'echet bounds for validating programmatic weak supervision

Estimating Fr\'echet bounds for validating programmatic weak supervision

URL: http://arxiv.org/abs/2312.04601v1
Date: Thu, 7 Dec 2023 07:15:11 GMT
Title: Estimating Fr\'echet bounds for validating programmatic weak supervision
Authors: Felipe Maia Polo, Mikhail Yurochkin, Moulinath Banerjee, Subha Maity, Yuekai Sun
Abstract summary: We develop methods for estimating Fr'eche's bounds on (possibly high-dimensional) distribution classes in which some variables are continuous-valued. We demonstrate the usefulness of our algorithms by evaluating the performance of machine learning (ML) models trained with programmatic weak supervision (PWS)
Score: 50.13475056199486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop methods for estimating Fr\'echet bounds on (possibly high-dimensional) distribution classes in which some variables are continuous-valued. We establish the statistical correctness of the computed bounds under uncertainty in the marginal constraints and demonstrate the usefulness of our algorithms by evaluating the performance of machine learning (ML) models trained with programmatic weak supervision (PWS). PWS is a framework for principled learning from weak supervision inputs (e.g., crowdsourced labels, knowledge bases, pre-trained models on related tasks, etc), and it has achieved remarkable success in many areas of science and engineering. Unfortunately, it is generally difficult to validate the performance of ML models trained with PWS due to the absence of labeled data. Our algorithms address this issue by estimating sharp lower and upper bounds for performance metrics such as accuracy/recall/precision/F1 score.

Related papers

KAIROS: Scalable Model-Agnostic Data Valuation [8.766103946679435]
KAIROS is a scalable, model-agnostic valuation framework that assigns each example a distributional influence score.<n> KAIROS consistently outperforms state-of-the-art model-, Shapley-, and Wasserstein-based baselines in both accuracy and runtime.
arXiv Detail & Related papers (2025-06-30T12:44:28Z)
Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach [1.3731623617634434]
We identify critical limitations in existing unlearning metrics and propose enhanced evaluation metrics inspired by conformal prediction. Our metrics can effectively capture the extent to which ground truth labels are excluded from the prediction set. We propose an unlearning framework that integrates conformal prediction insights into Carlini & Wagner adversarial attack loss.
arXiv Detail & Related papers (2025-01-31T18:58:43Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Group Robust Classification Without Any Group Information [5.053622900542495]
This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. We propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner.
arXiv Detail & Related papers (2023-10-28T01:29:18Z)
A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation [15.728090002818963]
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels.
arXiv Detail & Related papers (2023-08-01T05:01:05Z)
Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs [17.46692880231195]
We propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve the accuracy and robustness of the model. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets.
arXiv Detail & Related papers (2023-07-24T18:59:46Z)
Exploring validation metrics for offline model-based optimisation with diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Approaching Neural Network Uncertainty Realism [53.308409014122816]
Quantifying or at least upper-bounding uncertainties is vital for safety-critical systems such as autonomous vehicles. We evaluate uncertainty realism -- a strict quality criterion -- with a Mahalanobis distance-based statistical test. We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.
arXiv Detail & Related papers (2021-01-08T11:56:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.