Eagle: Ethical Dataset Given from Real Interactions
- URL: http://arxiv.org/abs/2402.14258v1
- Date: Thu, 22 Feb 2024 03:46:02 GMT
- Title: Eagle: Ethical Dataset Given from Real Interactions
- Authors: Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin
- Abstract summary: We create datasets extracted from real interactions between ChatGPT and users that exhibit social biases, toxicity, and immoral problems.
Our experiments show that Eagle captures complementary aspects, not covered by existing datasets proposed for evaluation and mitigation of such ethical challenges.
- Score: 74.7319697510621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have demonstrated that large language models (LLMs) have
ethical-related problems such as social biases, lack of moral reasoning, and
generation of offensive content. The existing evaluation metrics and methods to
address these ethical challenges use datasets intentionally created by
instructing humans to create instances including ethical problems. Therefore,
the data does not reflect prompts that users actually provide when utilizing
LLM services in everyday contexts. This may not lead to the development of safe
LLMs that can address ethical challenges arising in real-world applications. In
this paper, we create Eagle datasets extracted from real interactions between
ChatGPT and users that exhibit social biases, toxicity, and immoral problems.
Our experiments show that Eagle captures complementary aspects, not covered by
existing datasets proposed for evaluation and mitigation of such ethical
challenges. Our code is publicly available at
https://huggingface.co/datasets/MasahiroKaneko/eagle.
Related papers
- MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models [51.69735366140249]
We introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools.
Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions.
Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models.
arXiv Detail & Related papers (2024-04-18T11:38:25Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) [0.0]
ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare.
Despite their potential benefits, researchers have underscored various ethical implications.
This work aims to map the ethical landscape surrounding the current stage of deployment of LLMs in medicine and healthcare.
arXiv Detail & Related papers (2024-03-21T15:20:07Z) - The Ethics of Interaction: Mitigating Security Threats in LLMs [1.407080246204282]
The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy.
We scrutinize five major threats--prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate-based content--to assess their critical ethical consequences and the urgency they create for robust defensive strategies.
arXiv Detail & Related papers (2024-01-22T17:11:37Z) - EALM: Introducing Multidimensional Ethical Alignment in Conversational
Information Retrieval [43.72331337131317]
We introduce a workflow that integrates ethical alignment with an initial ethical judgment stage for efficient data screening.
We present the QA-ETHICS dataset adapted from the ETHICS benchmark, which serves as an evaluation tool by unifying scenarios and label meanings.
In addition, we suggest a new approach that achieves top performance in both binary and multi-label ethical judgment tasks.
arXiv Detail & Related papers (2023-10-02T08:22:34Z) - Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and
Toxicity [19.94836502156002]
Large language models (LLMs) may exhibit social prejudice and toxicity, posing ethical and societal dangers of consequences resulting from irresponsibility.
We empirically benchmark ChatGPT on multiple sample datasets.
We find that a significant number of ethical risks cannot be addressed by existing benchmarks.
arXiv Detail & Related papers (2023-01-30T13:20:48Z) - An Ethical Highlighter for People-Centric Dataset Creation [62.886916477131486]
We propose an analytical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps.
Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.
arXiv Detail & Related papers (2020-11-27T07:18:44Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.