Using In-Context Learning to Improve Dialogue Safety
- URL: http://arxiv.org/abs/2302.00871v3
- Date: Sun, 22 Oct 2023 19:28:24 GMT
- Title: Using In-Context Learning to Improve Dialogue Safety
- Authors: Nicholas Meade, Spandana Gella, Devamanyu Hazarika, Prakhar Gupta, Di
Jin, Siva Reddy, Yang Liu, Dilek Hakkani-T\"ur
- Abstract summary: We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
- Score: 45.303005593685036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large neural-based conversational models have become increasingly
proficient dialogue agents, recent work has highlighted safety issues with
these systems. For example, these systems can be goaded into generating toxic
content, which often perpetuates social biases or stereotypes. We investigate a
retrieval-based method for reducing bias and toxicity in responses from
chatbots. It uses in-context learning to steer a model towards safer
generations. Concretely, to generate a response to an unsafe dialogue context,
we retrieve demonstrations of safe responses to similar dialogue contexts. We
find our method performs competitively with strong baselines without requiring
training. For instance, using automatic evaluation, we find our best fine-tuned
baseline only generates safe responses to unsafe dialogue contexts from
DiaSafety 4.04% more than our approach. Finally, we also propose a re-ranking
procedure which can further improve response safeness.
Related papers
- GrounDial: Human-norm Grounded Safe Dialog Response Generation [39.55597493155821]
We propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning.
A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.
arXiv Detail & Related papers (2024-02-14T06:25:50Z) - Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Learn What NOT to Learn: Towards Generative Safety in Chatbots [40.8106410437709]
We present a novel framework, named "LOT" (Learn NOT to), that employs a contrastive loss to enhance generalization by learning from both positive and negative training signals.
LOT reduces toxicity by up to four-fold while achieving four to six-fold higher rates of engagingness and fluency compared to baseline models.
arXiv Detail & Related papers (2023-04-21T18:59:06Z) - DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines [48.780256371992515]
We introduce DialGuide, a framework for controlling dialogue model behavior using natural language rules.
Our dataset contains 10,737 positive and 15,467 negative dialogue context-response-guideline triplets across two domains - chit-chat and safety.
arXiv Detail & Related papers (2022-12-20T18:57:18Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark [42.322782754346406]
We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting.
We compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples.
Experiments show that existing utterance-level safety tools guarding fail catastrophically on our dataset.
arXiv Detail & Related papers (2021-10-16T04:17:12Z) - Retrieval-Free Knowledge-Grounded Dialogue Response Generation with
Adapters [52.725200145600624]
We propose KnowExpert to bypass the retrieval process by injecting prior knowledge into the pre-trained language models with lightweight adapters.
Experimental results show that KnowExpert performs comparably with the retrieval-based baselines.
arXiv Detail & Related papers (2021-05-13T12:33:23Z) - Saying No is An Art: Contextualized Fallback Responses for Unanswerable
Dialogue Queries [3.593955557310285]
Most dialogue systems rely on hybrid approaches for generating a set of ranked responses.
We design a neural approach which generates responses which are contextually aware with the user query.
Our simple approach makes use of rules over dependency parses and a text-to-text transformer fine-tuned on synthetic data of question-response pairs.
arXiv Detail & Related papers (2020-12-03T12:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.