Related papers: Using Foundation Models to Detect Policy Violations with Minimal Supervision

Using Foundation Models to Detect Policy Violations with Minimal Supervision

URL: http://arxiv.org/abs/2306.06234v1
Date: Fri, 9 Jun 2023 20:08:48 GMT
Title: Using Foundation Models to Detect Policy Violations with Minimal Supervision
Authors: Sid Mittal, Vineet Gupta, Frederick Liu, Mukund Sundararajan
Abstract summary: We seek to leverage foundation models' capabilities to detect policy violations. We compose the hard-prompts with soft prompt tuning to produce a classifier that attains high accuracy with very little supervision. We identify several unintuitive aspects of foundation models.
Score: 15.599296461516982
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models, i.e. large neural networks pre-trained on large text corpora, have revolutionized NLP. They can be instructed directly (e.g. (arXiv:2005.14165)) - this is called hard prompting - and they can be tuned using very little data (e.g. (arXiv:2104.08691)) - this technique is called soft prompting. We seek to leverage their capabilities to detect policy violations. Our contributions are: We identify a hard prompt that adapts chain-of-thought prompting to policy violation tasks. This prompt produces policy violation classifications, along with extractive explanations that justify the classification. We compose the hard-prompts with soft prompt tuning to produce a classifier that attains high accuracy with very little supervision; the same classifier also produces explanations. Though the supervision only acts on the classifications, we find that the modified explanations remain consistent with the (tuned) model's response. Along the way, we identify several unintuitive aspects of foundation models. For instance, adding an example from a specific class can actually reduce predictions of that class, and separately, the effects of tokenization on scoring etc. Based on our technical results, we identify a simple workflow for product teams to quickly develop effective policy violation detectors.

Related papers

Deconstructing In-Context Learning: Understanding Prompts via Corruption [13.37109575313212]
We decompose the entire prompt into four components: task description, demonstration inputs, labels, and inline instructions. We study models ranging from 1.5B to 70B in size, using ten datasets covering classification and generation tasks. We find that repeating text within the prompt boosts model performance, and bigger models are more sensitive to the semantics of the prompt.
arXiv Detail & Related papers (2024-04-02T15:50:55Z)
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions. We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z)
Generative Prompt Tuning for Relation Classification [21.027631157115135]
We propose a novel generative prompt tuning method to reformulate relation classification as an infilling problem. In addition, we design entity-guided decoding and discriminative relation scoring to generate and align relations effectively and efficiently during inference.
arXiv Detail & Related papers (2022-10-22T12:40:23Z)
Language Models in the Loop: Incorporating Prompting into Weak Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z)
Learning to Detect Instance-level Salient Objects Using Complementary Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z)
Label-Descriptive Patterns and their Application to Characterizing Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z)
PTR: Prompt Tuning with Rules for Text Classification [64.1655047016891]
Fine-tuned pre-trained language models (PLMs) have achieved awesome performance on almost all NLP tasks. We propose prompt tuning with rules (PTR) for many-class text classification. PTR is able to encode prior knowledge of each class into prompt tuning.
arXiv Detail & Related papers (2021-05-24T13:24:02Z)
Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling. We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z)
How benign is benign overfitting? [96.07549886487526]
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models. Deep neural networks essentially achieve zero training error, even in the presence of label noise. We identify label noise as one of the causes for adversarial vulnerability.
arXiv Detail & Related papers (2020-07-08T11:07:10Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.