Related papers: RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

URL: http://arxiv.org/abs/2402.11178v1
Date: Sat, 17 Feb 2024 03:13:42 GMT
Title: RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Authors: Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani Azad, Ingrid Zukerman, Gholamreza Haffari
Abstract summary: ReNoVi is a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning.
Score: 46.634702800643566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.

Related papers

V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat [19.038481783630864]
Role-play and persona-based chat approaches rely heavily on static role descriptions, coarse-grained signal space, and low-quality synthetic data.<n>Human-like chat requires modeling subtle latent traits, such as emotional tone, situational awareness, and evolving personality.<n>To address these limitations, we propose a Verbal Auto-Bench (V-VAE) framework containing a variational auto-coding module and fine-grained, interpretable latent variables.
arXiv Detail & Related papers (2025-06-02T10:38:02Z)
EgoNormia: Benchmarking Physical Social Norm Understanding [52.87904722234434]
We present EgoNormia $|epsilon|$, consisting of 1,853 ego-centric videos of human interactions. The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. Our work demonstrates that current state-of-the-art vision-language models lack robust norm understanding, scoring a maximum of 45% on EgoNormia.
arXiv Detail & Related papers (2025-02-27T19:54:16Z)
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks [47.40240774236047]
We compare four Chat Llama 2 models against extensive human preferences on more than 11k single-turn and 2k multi-turn dialogues from over 2k human annotators. Most NLP benchmarks strongly correlate with human evaluations, suggesting that cheaper, automated metrics can serve as surprisingly reliable predictors of human preferences.
arXiv Detail & Related papers (2025-02-24T01:01:02Z)
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues [66.69453609603875]
Sociocultural norms serve as guiding principles for personal conduct in social interactions. We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs) We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase.
arXiv Detail & Related papers (2024-10-04T00:08:46Z)
Advancing AI with Integrity: Ethical Challenges and Solutions in Neural Machine Translation [0.0]
This paper addresses the ethical challenges of Artificial Intelligence in Neural Machine Translation (NMT) systems. We investigate the ethical competence of AI models in NMT, including data handling, privacy, data ownership, and consent. We discuss the societal impact of NMT and the broader ethical responsibilities of developers, positing them as stewards accountable for the societal repercussions of their creations.
arXiv Detail & Related papers (2024-04-01T12:03:35Z)
Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts. We propose a dual-step fine-tuning process to address these issues. We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z)
NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation [18.605252945314724]
We present a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline.
arXiv Detail & Related papers (2023-10-23T04:38:34Z)
WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks [4.102007186133394]
In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data.
arXiv Detail & Related papers (2023-03-05T19:45:42Z)
PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z)
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly [61.77957329364812]
We introduce a framework for addressing the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery. NormSAGE elicits knowledge about norms through directed questions representing the norm discovery task and conversation context. It further addresses the risk of language model hallucination with a self-verification mechanism ensuring that the norms discovered are correct.
arXiv Detail & Related papers (2022-10-16T18:30:05Z)
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis [79.18261352971284]
We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation. We evaluate SESCORE against existing metrics by comparing how their scores correlate with human ratings. SESCORE even achieves comparable performance to the best supervised metric COMET, despite receiving no human-annotated training data.
arXiv Detail & Related papers (2022-10-10T22:30:26Z)
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.