Related papers: Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

URL: http://arxiv.org/abs/2306.04746v3
Date: Sun, 14 Jan 2024 23:02:59 GMT
Title: Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models
Authors: Naoki Egami, Musashi Hinck, Brandon M. Stewart, Hanying Wei
Abstract summary: computational social science (CSS) analyze documents to explain social and political phenomena. One increasingly common way to annotate documents cheaply at scale is through large language models. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses.
Score: 0.2812395851874055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.

Related papers

Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective [6.164100243945264]
Semi-supervised learning (SSL) commonly exhibits confirmation bias, where models disproportionately favor certain classes. We introduce TaMatch, a unified framework for debiased training in SSL. We show that TaMatch significantly outperforms existing state-of-the-art methods across a range of challenging image classification tasks.
arXiv Detail & Related papers (2024-09-26T21:50:30Z)
Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases. FAST surpasses state-of-the-art baselines with superior debiasing performance. This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
A Debiased Nearest Neighbors Framework for Multi-Label Text Classification [13.30576550077694]
We introduce a DEbiased Nearest Neighbors (DENN) framework for Multi-Label Text Classification (MLTC) To address embedding alignment bias, we propose a debiased contrastive learning strategy, enhancing neighbor consistency on label co-occurrence. For confidence estimation bias, we present a debiased confidence estimation strategy, improving the adaptive combination of predictions from $k$NN and inductive binary classifications.
arXiv Detail & Related papers (2024-08-06T14:00:23Z)
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs [8.77694178599322]
We evaluate different approaches to quantifying label bias in a model's predictions. Our investigation reveals substantial label bias in models both before and after debiasing attempts. We propose a novel label bias calibration method tailored for few-shot prompting.
arXiv Detail & Related papers (2024-05-04T19:53:03Z)
Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. In nature, regression also requires unbiased methods to generate high-quality labels. We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z)
Gray Learning from Non-IID Data with Out-of-distribution Samples [45.788789553551176]
The integrity of training data, even when annotated by experts, is far from guaranteed. We introduce a novel approach, termed textitGray Learning, which leverages both ground-truth and complementary labels. By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings.
arXiv Detail & Related papers (2022-06-19T10:46:38Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning. This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions. ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification [1.90365714903665]
This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ. We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax.
arXiv Detail & Related papers (2021-07-15T17:59:50Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.