Related papers: Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts

Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts

URL: http://arxiv.org/abs/2511.01907v1
Date: Fri, 31 Oct 2025 13:39:56 GMT
Title: Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts
Authors: Ameemah Humayun, Bushra Zubair, Maryam Mustafa,
Abstract summary: Low-resource countries represent over 90% of maternal deaths, with Pakistan among the top four countries contributing nearly half in 2023.<n>Since these deaths are mostly preventable, large language models (LLMs) can help address this crisis by automating health communication and risk assessment.<n>We conduct a two-stage study in Pakistan: analyzing data from clinical observations, interviews, and focus groups with clinicians and patients, and evaluating the interpretive capabilities of five popular LLMs on this data.
Score: 2.3895981099137535
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Low-resource countries represent over 90% of maternal deaths, with Pakistan among the top four countries contributing nearly half in 2023. Since these deaths are mostly preventable, large language models (LLMs) can help address this crisis by automating health communication and risk assessment. However, sexual and reproductive health (SRH) communication in conservative contexts often relies on indirect language that obscures meaning, complicating LLM-based interventions. We conduct a two-stage study in Pakistan: (1) analyzing data from clinical observations, interviews, and focus groups with clinicians and patients, and (2) evaluating the interpretive capabilities of five popular LLMs on this data. Our analysis identifies two axes of communication (referential domain and expression approach) and shows LLMs struggle with semantic drift, myths, and polysemy in clinical interactions. We contribute: (1) empirical themes in SRH communication, (2) a categorization framework for indirect communication, (3) evaluation of LLM performance, and (4) design recommendations for culturally-situated SRH communication.

Related papers

Toward expert-level motivational interviewing for health behavior improvement with LLMs [17.267453197266715]
Motivational interviewing (MI) is an effective counseling approach for promoting health behavior change, but its impact is constrained by the need for highly trained human counselors.<n>This study developed and evaluated Large Language Models for Motivational Interviewing (MI-LLMs)<n>Three Chinese-capable open-source LLMs were fine-tuned on this corpus and were named as MI-LLMs.
arXiv Detail & Related papers (2025-12-17T13:43:26Z)
Independent Clinical Evaluation of General-Purpose LLM Responses to Signals of Suicide Risk [32.17406690566923]
We introduce findings and methods to facilitate evidence-based discussion about how large language models (LLMs) should behave in response to user signals of risk of suicidal thoughts and behaviors (STB)<n>We find that OLMo-2-32b, and, possibly by extension, other LLMs, will become less likely to invite continued dialog as users send more signals of STB risk in multi-turn settings.
arXiv Detail & Related papers (2025-10-31T14:47:11Z)
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z)
Communication Styles and Reader Preferences of LLM and Human Experts in Explaining Health Information [20.955508468328603]
Our study evaluates the communication styles of large language models (LLMs)<n>We compiled a dataset of 1498 health misinformation explanations from authoritative fact-checking organizations.<n>Our results suggest that LLMs' structured approach to presenting information may be more effective at engaging readers.
arXiv Detail & Related papers (2025-05-13T00:32:38Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data. We curate a dataset of 40,000 two-person informational interviews from NPR and CNN. LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness [73.73883111570458]
We introduce the first multilingual Event Extraction framework for extracting epidemic event information for a wide range of diseases and languages. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models. Our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 from Chinese Weibo posts without any training in Chinese.
arXiv Detail & Related papers (2024-10-24T03:03:54Z)
Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language [6.3576870613251675]
We use prompt-based LLMs to examine the relationship between social media language patterns and trends in national health outcomes. Our work is the first to empirically link social media linguistic patterns to real-world public health trends.
arXiv Detail & Related papers (2024-03-01T21:29:32Z)
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence [46.71469172542448]
This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts. It consists of 345 plain language summaries of abstracts generated from three randomized controlled trials (RCTs) We assess the factuality of critical elements of RCTs in those summaries, as well as the reported findings concerning these.
arXiv Detail & Related papers (2024-02-18T04:45:01Z)
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries [31.82249599013959]
Large language models (LLMs) are transforming the ways the general public accesses and consumes information. LLMs demonstrate impressive language understanding and generation proficiencies, but concerns regarding their safety remain paramount. It remains unclear how these LLMs perform in the context of non-English languages.
arXiv Detail & Related papers (2023-10-19T20:02:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.