Process for Adapting Language Models to Society (PALMS) with
Values-Targeted Datasets
- URL: http://arxiv.org/abs/2106.10328v1
- Date: Fri, 18 Jun 2021 19:38:28 GMT
- Title: Process for Adapting Language Models to Society (PALMS) with
Values-Targeted Datasets
- Authors: Irene Solaiman (1) and Christy Dennison (1) ((1) OpenAI)
- Abstract summary: Language models can generate harmful and biased outputs and exhibit undesirable behavior.
We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted datasets.
We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language models can generate harmful and biased outputs and exhibit
undesirable behavior. We propose a Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets, an iterative process to
significantly change model behavior by crafting and fine-tuning on a dataset
that reflects a predetermined set of target values. We evaluate our process
using three metrics: quantitative metrics with human evaluations that score
output adherence to a target value, and toxicity scoring on outputs; and
qualitative metrics analyzing the most common word associated with a given
social category. Through each iteration, we add additional training dataset
examples based on observed shortcomings from evaluations. PALMS performs
significantly better on all metrics compared to baseline and control models for
a broad range of GPT-3 language model sizes without compromising capability
integrity. We find that the effectiveness of PALMS increases with model size.
We show that significantly adjusting language model behavior is feasible with a
small, hand-curated dataset.
Related papers
- Towards More Effective Table-to-Text Generation: Assessing In-Context Learning and Self-Evaluation with Open-Source Models [0.0]
This study explores the effectiveness of various in-context learning strategies in language models (LMs) across benchmark datasets.
We employ a large language model (LLM) self-evaluation approach using chain-of-thought reasoning and assess its correlation with human-aligned metrics like BERTScore.
Our findings highlight the significant impact of examples in improving table-to-text generation and suggest that, while LLM self-evaluation has potential, its current alignment with human judgment could be enhanced.
arXiv Detail & Related papers (2024-10-15T09:19:42Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - COPAL: Continual Pruning in Large Language Generative Models [23.747878534962663]
COPAL is an algorithm developed for pruning large language generative models under a continual model adaptation setting.
Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models.
arXiv Detail & Related papers (2024-05-02T18:24:41Z) - Diversity-Aware Ensembling of Language Models Based on Topological Data
Analysis [3.1734682813501514]
Existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model.
We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other.
arXiv Detail & Related papers (2024-02-22T00:04:21Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Evaluating Representations with Readout Model Switching [19.907607374144167]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.