The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional
Supporters for Queer Youth
- URL: http://arxiv.org/abs/2402.11886v1
- Date: Mon, 19 Feb 2024 06:54:55 GMT
- Title: The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional
Supporters for Queer Youth
- Authors: Shir Lissak, Nitay Calderon, Geva Shenkman, Yaakov Ophir, Eyal
Fruchter, Anat Brunstein Klomek and Roi Reichart
- Abstract summary: This paper aims to explore the potential of Large Language Models to revolutionize emotional support for queers.
We develop a novel ten-question scale that is inspired by psychological standards and expert input.
We find that LLM responses are supportive and inclusive, outscoring humans.
However, they tend to be generic, not empathetic enough, and lack personalization, resulting in nonreliable and potentially harmful advice.
- Score: 14.751539420563752
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Queer youth face increased mental health risks, such as depression, anxiety,
and suicidal ideation. Hindered by negative stigma, they often avoid seeking
help and rely on online resources, which may provide incompatible information.
Although access to a supportive environment and reliable information is
invaluable, many queer youth worldwide have no access to such support. However,
this could soon change due to the rapid adoption of Large Language Models
(LLMs) such as ChatGPT. This paper aims to comprehensively explore the
potential of LLMs to revolutionize emotional support for queers. To this end,
we conduct a qualitative and quantitative analysis of LLM's interactions with
queer-related content. To evaluate response quality, we develop a novel
ten-question scale that is inspired by psychological standards and expert
input. We apply this scale to score several LLMs and human comments to posts
where queer youth seek advice and share experiences. We find that LLM responses
are supportive and inclusive, outscoring humans. However, they tend to be
generic, not empathetic enough, and lack personalization, resulting in
nonreliable and potentially harmful advice. We discuss these challenges,
demonstrate that a dedicated prompt can improve the performance, and propose a
blueprint of an LLM-supporter that actively (but sensitively) seeks user
context to provide personalized, empathetic, and reliable responses. Our
annotated dataset is available for further research.
Related papers
- LLM Echo Chamber: personalized and automated disinformation [0.0]
Large Language Models can spread persuasive, humanlike misinformation at scale, which could influence public opinion.
This study examines these risks, focusing on LLMs ability to propagate misinformation as factual.
To investigate this, we built the LLM Echo Chamber, a controlled digital environment simulating social media chatrooms, where misinformation often spreads.
This setup, evaluated by GPT4 for persuasiveness and harmfulness, sheds light on the ethical concerns surrounding LLMs and emphasizes the need for stronger safeguards against misinformation.
arXiv Detail & Related papers (2024-09-24T17:04:12Z) - I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation [60.00337758147594]
This study explores the proactive ability of LLMs to seek user support.
We propose metrics to evaluate the trade-off between performance improvements and user burden.
Our experiments show that without external feedback, many LLMs struggle to recognize their need for user support.
arXiv Detail & Related papers (2024-07-20T06:12:29Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Can AI Relate: Testing Large Language Model Response for Mental Health Support [23.97212082563385]
Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
We develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
arXiv Detail & Related papers (2024-05-20T13:42:27Z) - Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided [38.11184388388781]
Large language models (LLMs) have offered new opportunities for emotional support.
This work takes a first step by engaging with cognitive reappraisals.
We conduct a first-of-its-kind expert evaluation of an LLM's zero-shot ability to generate cognitive reappraisal responses.
arXiv Detail & Related papers (2024-04-01T17:56:30Z) - Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation [28.74445806009475]
This work initially analyzes the results of large language models (LLMs) on ESConv.
We observe that exhibiting high preference for specific strategies hinders effective emotional support.
Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters.
arXiv Detail & Related papers (2024-02-20T18:21:32Z) - Factuality of Large Language Models: A Survey [29.557596701431827]
We critically analyze existing work with the aim to identify the major challenges and their associated causes.
We analyze the obstacles to automated factuality evaluation for open-ended text generation.
arXiv Detail & Related papers (2024-02-04T09:36:31Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.