RENOVI: A Benchmark Towards Remediating Norm Violations in
Socio-Cultural Conversations
- URL: http://arxiv.org/abs/2402.11178v1
- Date: Sat, 17 Feb 2024 03:13:42 GMT
- Title: RENOVI: A Benchmark Towards Remediating Norm Violations in
Socio-Cultural Conversations
- Authors: Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng, Yuncheng Hua, Lizhen
Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj
Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani Azad, Ingrid
Zukerman, Gholamreza Haffari
- Abstract summary: ReNoVi is a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms.
ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning.
- Score: 46.634702800643566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Norm violations occur when individuals fail to conform to culturally accepted
behaviors, which may lead to potential conflicts. Remediating norm violations
requires social awareness and cultural sensitivity of the nuances at play. To
equip interactive AI systems with a remediation ability, we offer ReNoVi - a
large-scale corpus of 9,258 multi-turn dialogues annotated with social norms,
as well as define a sequence of tasks to help understand and remediate norm
violations step by step. ReNoVi consists of two parts: 512 human-authored
dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT
through prompt learning. While collecting sufficient human-authored data is
costly, synthetic conversations provide suitable amounts of data to help
mitigate the scarcity of training data, as well as the chance to assess the
alignment between LLMs and humans in the awareness of social norms. We thus
harness the power of ChatGPT to generate synthetic training data for our task.
To ensure the quality of both human-authored and synthetic data, we follow a
quality control protocol during data collection. Our experimental results
demonstrate the importance of remediating norm violations in socio-cultural
conversations, as well as the improvement in performance obtained from
synthetic data.
Related papers
- Advancing AI with Integrity: Ethical Challenges and Solutions in Neural Machine Translation [0.0]
This paper addresses the ethical challenges of Artificial Intelligence in Neural Machine Translation (NMT) systems.
We investigate the ethical competence of AI models in NMT, including data handling, privacy, data ownership, and consent.
We discuss the societal impact of NMT and the broader ethical responsibilities of developers, positing them as stewards accountable for the societal repercussions of their creations.
arXiv Detail & Related papers (2024-04-01T12:03:35Z) - Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling
Social Norm Adherence and Violation [18.605252945314724]
We present a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures.
Our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline.
arXiv Detail & Related papers (2023-10-23T04:38:34Z) - WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data
augmentation in tExt Regression Tasks [4.102007186133394]
In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER.
We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data.
arXiv Detail & Related papers (2023-03-05T19:45:42Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations
On-the-Fly [61.77957329364812]
We introduce a framework for addressing the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery.
NormSAGE elicits knowledge about norms through directed questions representing the norm discovery task and conversation context.
It further addresses the risk of language model hallucination with a self-verification mechanism ensuring that the norms discovered are correct.
arXiv Detail & Related papers (2022-10-16T18:30:05Z) - Not All Errors are Equal: Learning Text Generation Metrics using
Stratified Error Synthesis [79.18261352971284]
We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation.
We evaluate SESCORE against existing metrics by comparing how their scores correlate with human ratings.
SESCORE even achieves comparable performance to the best supervised metric COMET, despite receiving no human-annotated training data.
arXiv Detail & Related papers (2022-10-10T22:30:26Z) - BOSS: A Benchmark for Human Belief Prediction in Object-context
Scenarios [14.23697277904244]
This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems.
We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario.
arXiv Detail & Related papers (2022-06-21T18:29:17Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.