Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions?
- URL: http://arxiv.org/abs/2106.01465v1
- Date: Wed, 2 Jun 2021 20:57:58 GMT
- Title: Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions?
- Authors: Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and
Kai-Wei Chang
- Abstract summary: We investigate the effectiveness of natural language interventions for reading-comprehension systems.
We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
- Score: 62.74872383104381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Is it possible to use natural language to intervene in a model's behavior and
alter its prediction in a desired way? We investigate the effectiveness of
natural language interventions for reading-comprehension systems, studying this
in the context of social stereotypes. Specifically, we propose a new language
understanding task, Linguistic Ethical Interventions (LEI), where the goal is
to amend a question-answering (QA) model's unethical behavior by communicating
context-specific principles of ethics and equity to it. To this end, we build
upon recent methods for quantifying a system's social stereotypes, augmenting
them with different kinds of ethical interventions and the desired model
behavior under such interventions. Our zero-shot evaluation finds that even
today's powerful neural language models are extremely poor ethical-advice
takers, that is, they respond surprisingly little to ethical interventions even
though these interventions are stated as simple sentences. Few-shot learning
improves model behavior but remains far from the desired outcome, especially
when evaluated for various types of generalization. Our new task thus poses a
novel language understanding challenge for the community.
Related papers
- Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory [0.0]
This work investigates the alignment between personalized language models and survey participants on a Moral Foundation questionnaire.
We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations.
Our findings indicate that adapted models struggle to represent the survey-leading assessment of political ideologies.
arXiv Detail & Related papers (2024-08-21T08:20:41Z) - Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models [2.5200794639628032]
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics.
We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values.
arXiv Detail & Related papers (2024-05-11T19:26:00Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Neural Conversation Models and How to Rein Them in: A Survey of Failures
and Fixes [17.489075240435348]
Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way.
From a linguistic perspective, contributing to a conversation is high.
Recent approaches try to tame the underlying language models at various intervention points.
arXiv Detail & Related papers (2023-08-11T12:07:45Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Democratizing Ethical Assessment of Natural Language Generation Models [0.0]
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context.
Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms.
Ethical assessment of these models is therefore critical.
This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models.
arXiv Detail & Related papers (2022-06-30T12:20:31Z) - From Outcome-Based to Language-Based Preferences [13.05235037907183]
We review the literature on models that try to explain human behavior in social interactions described by normal-form games with monetary payoffs.
We focus on the growing body of research showing that people react to the language in which actions are described, especially when it activates moral concerns.
arXiv Detail & Related papers (2022-06-15T05:11:58Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.