LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions
- URL: http://arxiv.org/abs/2406.08824v1
- Date: Thu, 13 Jun 2024 05:31:49 GMT
- Title: LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions
- Authors: Rumaisa Azeem, Andrew Hundt, Masoumeh Mansouri, Martim Brandão,
- Abstract summary: Research has raised concerns about the potential for Large Language Models to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications.
We conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs.
Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes.
- Score: 3.1247504290622214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Members of the Human-Robot Interaction (HRI) and Artificial Intelligence (AI) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interactions, doing household and workplace tasks, approximating `common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To address these concerns, we conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs. Our evaluation reveals that LLMs currently lack robustness when encountering people across a diverse range of protected identity characteristics (e.g., race, gender, disability status, nationality, religion, and their intersections), producing biased outputs consistent with directly discriminatory outcomes -- e.g. `gypsy' and `mute' people are labeled untrustworthy, but not `european' or `able-bodied' people. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions -- such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. Data and code will be made available.
Related papers
- Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19 [6.367429891237191]
Large language models (LLMs) have shown remarkable capabilities in various natural language tasks.
This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation.
arXiv Detail & Related papers (2024-09-23T13:55:13Z) - InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback [70.54226917774933]
This paper introduces InferAct, a novel approach to proactively detect potential errors before risky actions are executed.
InferAct acts as a human proxy, detecting unsafe actions and alerting users for intervention.
Experiments on three widely-used tasks demonstrate the effectiveness of InferAct.
arXiv Detail & Related papers (2024-07-16T15:24:44Z) - How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies [0.0]
Commercial model development has focused efforts on'safety' training concerning legal liabilities at the expense of social impact evaluation.
This mimics a similar trend which we could observe for search engine autocompletion some years prior.
We present a novel evaluation task in the style of autocompletion prompts to assess stereotyping in LLMs.
arXiv Detail & Related papers (2024-07-16T14:04:35Z) - BadRobot: Manipulating Embodied LLMs in the Physical World [20.96351292684658]
Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings.
Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI.
We introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions.
arXiv Detail & Related papers (2024-07-16T13:13:16Z) - Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount.
These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm.
This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI)
Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z) - Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.