The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional
Supporters for Queer Youth
- URL: http://arxiv.org/abs/2402.11886v1
- Date: Mon, 19 Feb 2024 06:54:55 GMT
- Title: The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional
Supporters for Queer Youth
- Authors: Shir Lissak, Nitay Calderon, Geva Shenkman, Yaakov Ophir, Eyal
Fruchter, Anat Brunstein Klomek and Roi Reichart
- Abstract summary: This paper aims to explore the potential of Large Language Models to revolutionize emotional support for queers.
We develop a novel ten-question scale that is inspired by psychological standards and expert input.
We find that LLM responses are supportive and inclusive, outscoring humans.
However, they tend to be generic, not empathetic enough, and lack personalization, resulting in nonreliable and potentially harmful advice.
- Score: 14.751539420563752
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Queer youth face increased mental health risks, such as depression, anxiety,
and suicidal ideation. Hindered by negative stigma, they often avoid seeking
help and rely on online resources, which may provide incompatible information.
Although access to a supportive environment and reliable information is
invaluable, many queer youth worldwide have no access to such support. However,
this could soon change due to the rapid adoption of Large Language Models
(LLMs) such as ChatGPT. This paper aims to comprehensively explore the
potential of LLMs to revolutionize emotional support for queers. To this end,
we conduct a qualitative and quantitative analysis of LLM's interactions with
queer-related content. To evaluate response quality, we develop a novel
ten-question scale that is inspired by psychological standards and expert
input. We apply this scale to score several LLMs and human comments to posts
where queer youth seek advice and share experiences. We find that LLM responses
are supportive and inclusive, outscoring humans. However, they tend to be
generic, not empathetic enough, and lack personalization, resulting in
nonreliable and potentially harmful advice. We discuss these challenges,
demonstrate that a dedicated prompt can improve the performance, and propose a
blueprint of an LLM-supporter that actively (but sensitively) seeks user
context to provide personalized, empathetic, and reliable responses. Our
annotated dataset is available for further research.
Related papers
- LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Can AI Relate: Testing Large Language Model Response for Mental Health Support [23.97212082563385]
Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
This work develops an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
arXiv Detail & Related papers (2024-05-20T13:42:27Z) - Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided [38.11184388388781]
Large language models (LLMs) have offered new opportunities for emotional support.
This work takes a first step by engaging with cognitive reappraisals.
We conduct a first-of-its-kind expert evaluation of an LLM's zero-shot ability to generate cognitive reappraisal responses.
arXiv Detail & Related papers (2024-04-01T17:56:30Z) - Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation [28.74445806009475]
This work initially analyzes the results of large language models (LLMs) on ESConv.
We observe that exhibiting high preference for specific strategies hinders effective emotional support.
Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters.
arXiv Detail & Related papers (2024-02-20T18:21:32Z) - Know Your Audience: Do LLMs Adapt to Different Age and Education Levels? [21.302967282814784]
We evaluate the readability of answers generated by four state-of-the-art large language models (LLMs)
We compare the readability scores of the generated responses against the recommended comprehension level of each age and education group.
Our results suggest LLM answers need to be better adapted to the intended audience to be more comprehensible.
arXiv Detail & Related papers (2023-12-04T17:19:53Z) - You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner.
Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability.
Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Revisiting the Reliability of Psychological Scales on Large Language
Models [66.31055885857062]
This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.89346248535922]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.