Related papers: When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do

When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do

URL: http://arxiv.org/abs/2601.22893v2
Date: Mon, 02 Feb 2026 11:45:26 GMT
Title: When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do
Authors: Eduardo C. Garrido-Merchán, Adriana Constanza Cirera Tirschtigel,
Abstract summary: This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder.<n>Human participants endorsed significantly fewer myths than LLMs.<n>In 18 of the 30 evaluated items, humans significantly outperformed AI systems.
Score: 1.3320917259299652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models become ubiquitous sources of health information, understanding their capacity to accurately represent stigmatized conditions is crucial for responsible deployment. This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder, a condition particularly vulnerable to harmful myths. We administered a 30-item instrument measuring autism knowledge to 178 participants and three state-of-the-art LLMs including GPT-4, Claude, and Gemini. Contrary to expectations that AI systems would leverage their vast training data to outperform humans, we found the opposite pattern: human participants endorsed significantly fewer myths than LLMs (36.2% vs. 44.8% error rate; z = -2.59, p = .0048). In 18 of the 30 evaluated items, humans significantly outperformed AI systems. These findings reveal a critical blind spot in current AI systems and have important implications for human-AI interaction design, the epistemology of machine knowledge, and the need to center neurodivergent perspectives in AI development.

Related papers

Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public [46.86429592892395]
explainable AI (XAI) addresses this by providing AI decision-making insight.<n>We present results from two large-scale experiments combining a fairness-based diagnosis AI model and different XAI explanations.
arXiv Detail & Related papers (2025-12-14T00:06:06Z)
A Definition of AGI [208.25193480759026]
The lack of a concrete definition for Artificial General Intelligence obscures the gap between today's specialized AI and human-level cognition.<n>This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult.
arXiv Detail & Related papers (2025-10-21T01:28:35Z)
Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients [65.57160385098935]
Early detection of Diabetic Retinopathy can reduce the risk of vision loss by up to 95%.<n>We developed RAIS-DR, a Responsible AI System for DR screening that incorporates ethical principles across the AI lifecycle.<n>We evaluated RAIS-DR against the FDA-approved EyeArt system on a local dataset of 1,046 patients, unseen by both systems.
arXiv Detail & Related papers (2025-08-17T21:54:11Z)
Divergent Realities: A Comparative Analysis of Human Expert vs. Artificial Intelligence Based Generation and Evaluation of Treatment Plans in Dermatology [0.0]
evaluating AI-generated treatment plans is a key challenge as AI expands beyond diagnostics.<n>This study compares plans from human experts and two AI models (a generalist and a reasoner), assessed by both human peers and a superior AI judge.
arXiv Detail & Related papers (2025-07-08T06:59:58Z)
Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care [2.4339626079536925]
The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis.<n>Despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside.<n>This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting.
arXiv Detail & Related papers (2025-07-02T01:43:06Z)
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z)
Trustworthy and Practical AI for Healthcare: A Guided Deferral System with Large Language Models [1.2281181385434294]
Large language models (LLMs) offer a valuable technology for various applications in healthcare.<n>Their tendency to hallucinate and the existing reliance on proprietary systems pose challenges in environments concerning critical decision-making.<n>This paper presents a novel HAIC guided deferral system that can simultaneously parse medical reports for disorder classification, and defer uncertain predictions with intelligent guidance to humans.
arXiv Detail & Related papers (2024-06-11T12:41:54Z)
Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models [8.863857300695667]
We analyzed 156 human-generated text and saliency-based explanations in a question-answering task. Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.
arXiv Detail & Related papers (2024-04-11T13:16:51Z)
Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z)
Improving Human-AI Collaboration With Descriptions of AI Behavior [14.904401331154062]
People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions.
arXiv Detail & Related papers (2023-01-06T00:33:08Z)
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.