Towards Healthy AI: Large Language Models Need Therapists Too
- URL: http://arxiv.org/abs/2304.00416v1
- Date: Sun, 2 Apr 2023 00:39:12 GMT
- Title: Towards Healthy AI: Large Language Models Need Therapists Too
- Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Kush R. Varshney
- Abstract summary: We define Healthy AI to be safe, trustworthy and ethical.
We present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors.
- Score: 41.86344997530743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) have led to the development
of powerful AI chatbots capable of engaging in natural and human-like
conversations. However, these chatbots can be potentially harmful, exhibiting
manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to
be safe, trustworthy and ethical. To create healthy AI systems, we present the
SafeguardGPT framework that uses psychotherapy to correct for these harmful
behaviors in AI chatbots. The framework involves four types of AI agents: a
Chatbot, a "User," a "Therapist," and a "Critic." We demonstrate the
effectiveness of SafeguardGPT through a working example of simulating a social
conversation. Our results show that the framework can improve the quality of
conversations between AI chatbots and humans. Although there are still several
challenges and directions to be addressed in the future, SafeguardGPT provides
a promising approach to improving the alignment between AI chatbots and human
values. By incorporating psychotherapy and reinforcement learning techniques,
the framework enables AI chatbots to learn and adapt to human preferences and
values in a safe and ethical way, contributing to the development of a more
human-centric and responsible AI.
Related papers
- The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships [17.5741039825938]
We identify six categories of harmful behaviors exhibited by the AI companion Replika.
The AI contributes to these harms through four distinct roles: perpetrator, instigator, facilitator, and enabler.
arXiv Detail & Related papers (2024-10-26T09:18:17Z) - How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? [0.0]
This study examines the reliability of AI chatbots, specifically GPT 4.0, Claude 3 Opus, and Gemini Ultra 1.0, in predicting diseases from patient complaints in the emergency department.
Results suggest that GPT 4.0 achieves high accuracy with increased few-shot data, while Gemini Ultra 1.0 performs well with fewer examples, and Claude 3 Opus maintains consistent performance.
arXiv Detail & Related papers (2024-05-21T22:00:13Z) - A General-purpose AI Avatar in Healthcare [1.5081825869395544]
This paper focuses on the role of chatbots in healthcare and explores the use of avatars to make AI interactions more appealing to patients.
A framework of a general-purpose AI avatar application is demonstrated by using a three-category prompt dictionary and prompt improvement mechanism.
A two-phase approach is suggested to fine-tune a general-purpose AI language model and create different AI avatars to discuss medical issues with users.
arXiv Detail & Related papers (2024-01-10T03:44:15Z) - Assistive Chatbots for healthcare: a succinct review [0.0]
The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction.
There is a lack of trust on this technology regarding patient safety and data protection.
Patients have expressed dissatisfaction with Natural Language Processing skills.
arXiv Detail & Related papers (2023-08-08T10:35:25Z) - ChatGPT: More than a Weapon of Mass Deception, Ethical challenges and
responses from the Human-Centered Artificial Intelligence (HCAI) perspective [0.0]
This article explores the ethical problems arising from the use of ChatGPT as a kind of generative AI.
The main danger ChatGPT presents is the propensity to be used as a weapon of mass deception (WMD)
arXiv Detail & Related papers (2023-04-06T07:40:12Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.