GrounDial: Human-norm Grounded Safe Dialog Response Generation
- URL: http://arxiv.org/abs/2402.08968v1
- Date: Wed, 14 Feb 2024 06:25:50 GMT
- Title: GrounDial: Human-norm Grounded Safe Dialog Response Generation
- Authors: Siwon Kim, Shuyang Dai, Mohammad Kachuee, Shayan Ray, Tara Taghavi,
and Sungroh Yoon
- Abstract summary: We propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning.
A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.
- Score: 39.55597493155821
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current conversational AI systems based on large language models (LLMs) are
known to generate unsafe responses, agreeing to offensive user input or
including toxic content. Previous research aimed to alleviate the toxicity, by
fine-tuning LLM with manually annotated safe dialogue histories. However, the
dependency on additional tuning requires substantial costs. To remove the
dependency, we propose GrounDial, where response safety is achieved by
grounding responses to commonsense social rules without requiring fine-tuning.
A hybrid approach of in-context learning and human-norm-guided decoding of
GrounDial enables the response to be quantitatively and qualitatively safer
even without additional data or tuning.
Related papers
- Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training [67.30423823744506]
This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs)
We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position.
DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful
arXiv Detail & Related papers (2024-07-12T09:36:33Z) - Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Learn What NOT to Learn: Towards Generative Safety in Chatbots [40.8106410437709]
We present a novel framework, named "LOT" (Learn NOT to), that employs a contrastive loss to enhance generalization by learning from both positive and negative training signals.
LOT reduces toxicity by up to four-fold while achieving four to six-fold higher rates of engagingness and fluency compared to baseline models.
arXiv Detail & Related papers (2023-04-21T18:59:06Z) - Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z) - Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark [42.322782754346406]
We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting.
We compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples.
Experiments show that existing utterance-level safety tools guarding fail catastrophically on our dataset.
arXiv Detail & Related papers (2021-10-16T04:17:12Z) - Saying No is An Art: Contextualized Fallback Responses for Unanswerable
Dialogue Queries [3.593955557310285]
Most dialogue systems rely on hybrid approaches for generating a set of ranked responses.
We design a neural approach which generates responses which are contextually aware with the user query.
Our simple approach makes use of rules over dependency parses and a text-to-text transformer fine-tuned on synthetic data of question-response pairs.
arXiv Detail & Related papers (2020-12-03T12:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.