Evaluating Biases in Context-Dependent Health Questions
- URL: http://arxiv.org/abs/2403.04858v1
- Date: Thu, 7 Mar 2024 19:15:40 GMT
- Title: Evaluating Biases in Context-Dependent Health Questions
- Authors: Sharon Levy, Tahilin Sanchez Karver, William D. Adler, Michelle R.
Kaufman, Mark Dredze
- Abstract summary: We study how large language model biases are exhibited through contextual questions in the healthcare domain.
Our experiments reveal biases in each of these attributes, where young adult female users are favored.
- Score: 16.818168401472075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chat-based large language models have the opportunity to empower individuals
lacking high-quality healthcare access to receive personalized information
across a variety of topics. However, users may ask underspecified questions
that require additional context for a model to correctly answer. We study how
large language model biases are exhibited through these contextual questions in
the healthcare domain. To accomplish this, we curate a dataset of sexual and
reproductive healthcare questions that are dependent on age, sex, and location
attributes. We compare models' outputs with and without demographic context to
determine group alignment among our contextual questions. Our experiments
reveal biases in each of these attributes, where young adult female users are
favored.
Related papers
- How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles.
We explore various approaches to generate such questions using language models.
We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - CaLMQA: Exploring culturally specific long-form question answering across 23 languages [58.18984409715615]
CaLMQA is a collection of 1.5K culturally specific questions spanning 23 languages and 51 culturally translated questions from English into 22 other languages.
We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-studied languages such as Fijian and Kirundi.
Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers.
arXiv Detail & Related papers (2024-06-25T17:45:26Z) - Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - Emerging Challenges in Personalized Medicine: Assessing Demographic
Effects on Biomedical Question Answering Systems [0.0]
We find that irrelevant demographic information change up to 15% of the answers of a KG-grounded system and up to 23% of the answers of a text-based system.
We conclude that unjustified answer changes caused by patient demographics are a frequent phenomenon, which raises fairness concerns and should be paid more attention to.
arXiv Detail & Related papers (2023-10-16T16:45:52Z) - ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality.
We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions.
The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z) - Are Large Language Models Fit For Guided Reading? [6.85316573653194]
This paper looks at the ability of large language models to participate in educational guided reading.
We evaluate their ability to generate meaningful questions from the input text, generate diverse questions and recommend part of the text that a student should re-read.
arXiv Detail & Related papers (2023-05-18T02:03:55Z) - CHQ-Summ: A Dataset for Consumer Healthcare Question Summarization [21.331145794496774]
We introduce a new dataset, CHQ-Summ, that contains 1507 domain-expert annotated consumer health questions and corresponding summaries.
The dataset is derived from the community question-answering forum.
We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset.
arXiv Detail & Related papers (2022-06-14T03:49:03Z) - Gender and Racial Bias in Visual Question Answering Datasets [24.075869811508404]
We investigate gender and racial bias in visual question answering (VQA) datasets.
We find that the distribution of answers is highly different between questions about women and men, as well as the existence of detrimental gender-stereotypical samples.
Our findings suggest that there are dangers associated to using VQA datasets without considering and dealing with the potentially harmful stereotypes.
arXiv Detail & Related papers (2022-05-17T07:33:24Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.