Process for Adapting Language Models to Society (PALMS) with
Values-Targeted Datasets
- URL: http://arxiv.org/abs/2106.10328v1
- Date: Fri, 18 Jun 2021 19:38:28 GMT
- Title: Process for Adapting Language Models to Society (PALMS) with
Values-Targeted Datasets
- Authors: Irene Solaiman (1) and Christy Dennison (1) ((1) OpenAI)
- Abstract summary: Language models can generate harmful and biased outputs and exhibit undesirable behavior.
We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted datasets.
We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language models can generate harmful and biased outputs and exhibit
undesirable behavior. We propose a Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets, an iterative process to
significantly change model behavior by crafting and fine-tuning on a dataset
that reflects a predetermined set of target values. We evaluate our process
using three metrics: quantitative metrics with human evaluations that score
output adherence to a target value, and toxicity scoring on outputs; and
qualitative metrics analyzing the most common word associated with a given
social category. Through each iteration, we add additional training dataset
examples based on observed shortcomings from evaluations. PALMS performs
significantly better on all metrics compared to baseline and control models for
a broad range of GPT-3 language model sizes without compromising capability
integrity. We find that the effectiveness of PALMS increases with model size.
We show that significantly adjusting language model behavior is feasible with a
small, hand-curated dataset.
Related papers
- COPAL: Continual Pruning in Large Language Generative Models [23.747878534962663]
COPAL is an algorithm developed for pruning large language generative models under a continual model adaptation setting.
Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models.
arXiv Detail & Related papers (2024-05-02T18:24:41Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Diversity-Aware Ensembling of Language Models Based on Topological Data
Analysis [3.1734682813501514]
Existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model.
We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other.
arXiv Detail & Related papers (2024-02-22T00:04:21Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.