Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values
- URL: http://arxiv.org/abs/2210.07652v1
- Date: Fri, 14 Oct 2022 09:10:49 GMT
- Title: Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values
- Authors: Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab,
Pascale Fung
- Abstract summary: Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
- Score: 73.82043713141142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many NLP classification tasks, such as sexism/racism detection or toxicity
detection, are based on human values. Yet, human values can vary under diverse
cultural conditions. Therefore, we introduce a framework for value-aligned
classification that performs prediction based on explicitly written human
values in the command. Along with the task, we propose a practical approach
that distills value-aligned knowledge from large-scale language models (LLMs)
to construct value-aligned classifiers in two steps. First, we generate
value-aligned training data from LLMs by prompt-based few-shot learning. Next,
we fine-tune smaller classification models with the generated data for the
task. Empirical results show that our VA-Models surpass multiple baselines by
at least 15.56% on the F1-score, including few-shot learning with OPT-175B and
existing text augmentation methods. We suggest that using classifiers with
explicit human value input improves both inclusivity & explainability in AI.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.