Using Foundation Models to Detect Policy Violations with Minimal
Supervision
- URL: http://arxiv.org/abs/2306.06234v1
- Date: Fri, 9 Jun 2023 20:08:48 GMT
- Title: Using Foundation Models to Detect Policy Violations with Minimal
Supervision
- Authors: Sid Mittal, Vineet Gupta, Frederick Liu, Mukund Sundararajan
- Abstract summary: We seek to leverage foundation models' capabilities to detect policy violations.
We compose the hard-prompts with soft prompt tuning to produce a classifier that attains high accuracy with very little supervision.
We identify several unintuitive aspects of foundation models.
- Score: 15.599296461516982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models, i.e. large neural networks pre-trained on large text
corpora, have revolutionized NLP. They can be instructed directly (e.g.
(arXiv:2005.14165)) - this is called hard prompting - and they can be tuned
using very little data (e.g. (arXiv:2104.08691)) - this technique is called
soft prompting. We seek to leverage their capabilities to detect policy
violations. Our contributions are: We identify a hard prompt that adapts
chain-of-thought prompting to policy violation tasks. This prompt produces
policy violation classifications, along with extractive explanations that
justify the classification. We compose the hard-prompts with soft prompt tuning
to produce a classifier that attains high accuracy with very little
supervision; the same classifier also produces explanations. Though the
supervision only acts on the classifications, we find that the modified
explanations remain consistent with the (tuned) model's response. Along the
way, we identify several unintuitive aspects of foundation models. For
instance, adding an example from a specific class can actually reduce
predictions of that class, and separately, the effects of tokenization on
scoring etc. Based on our technical results, we identify a simple workflow for
product teams to quickly develop effective policy violation detectors.
Related papers
- Deconstructing In-Context Learning: Understanding Prompts via Corruption [13.37109575313212]
We decompose the entire prompt into four components: task description, demonstration inputs, labels, and inline instructions.
We study models ranging from 1.5B to 70B in size, using ten datasets covering classification and generation tasks.
We find that repeating text within the prompt boosts model performance, and bigger models are more sensitive to the semantics of the prompt.
arXiv Detail & Related papers (2024-04-02T15:50:55Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Generative Prompt Tuning for Relation Classification [21.027631157115135]
We propose a novel generative prompt tuning method to reformulate relation classification as an infilling problem.
In addition, we design entity-guided decoding and discriminative relation scoring to generate and align relations effectively and efficiently during inference.
arXiv Detail & Related papers (2022-10-22T12:40:23Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - PTR: Prompt Tuning with Rules for Text Classification [64.1655047016891]
Fine-tuned pre-trained language models (PLMs) have achieved awesome performance on almost all NLP tasks.
We propose prompt tuning with rules (PTR) for many-class text classification.
PTR is able to encode prior knowledge of each class into prompt tuning.
arXiv Detail & Related papers (2021-05-24T13:24:02Z) - Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling.
We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z) - How benign is benign overfitting? [96.07549886487526]
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models.
Deep neural networks essentially achieve zero training error, even in the presence of label noise.
We identify label noise as one of the causes for adversarial vulnerability.
arXiv Detail & Related papers (2020-07-08T11:07:10Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.