Trust & Safety of LLMs and LLMs in Trust & Safety
- URL: http://arxiv.org/abs/2412.02113v1
- Date: Tue, 03 Dec 2024 03:10:12 GMT
- Title: Trust & Safety of LLMs and LLMs in Trust & Safety
- Authors: Doohee You, Dan Chon,
- Abstract summary: This systematic review investigates the current research landscape on trust and safety in Large Language Models.
We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount.
This review provides insights on best practices for using LLMs in Trust and Safety, and explores emerging risks such as prompt injection and jailbreak attacks.
- Score: 0.0
- License:
- Abstract: In recent years, Large Language Models (LLMs) have garnered considerable attention for their remarkable abilities in natural language processing tasks. However, their widespread adoption has raised concerns pertaining to trust and safety. This systematic review investigates the current research landscape on trust and safety in LLMs, with a particular focus on the novel application of LLMs within the field of Trust and Safety itself. We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount, offering a consolidated perspective on this emerging trend.\ By synthesizing findings from various studies, we identify key challenges and potential solutions, aiming to benefit researchers and practitioners seeking to understand the nuanced interplay between LLMs and Trust and Safety. This review provides insights on best practices for using LLMs in Trust and Safety, and explores emerging risks such as prompt injection and jailbreak attacks. Ultimately, this study contributes to a deeper understanding of how LLMs can be effectively and responsibly utilized to enhance trust and safety in the digital realm.
Related papers
- Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks.
In this work, we investigate the interplay between reasoning and safety in LLMs.
We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z) - Large Language Model Safety: A Holistic Survey [35.42419096859496]
The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence.
This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks.
arXiv Detail & Related papers (2024-12-23T16:11:27Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [80.45174785447136]
Laboratory accidents pose significant risks to human life and property.
Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices.
There is a growing concern about large language models (LLMs) for guidance in various fields.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies [43.65655064122938]
Large Language Models (LLMs) agents have evolved to perform complex tasks.
The widespread applications of LLM agents demonstrate their significant commercial value.
However, they also expose security and privacy vulnerabilities.
This survey aims to provide a comprehensive overview of the newly emerged privacy and security issues faced by LLM agents.
arXiv Detail & Related papers (2024-07-28T00:26:24Z) - Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices [4.927763944523323]
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP)
This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives.
The paper recommends promising avenues for future research to enhance the security and risk management of LLMs.
arXiv Detail & Related papers (2024-03-19T07:10:58Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs)
We first propose a set of principles for trustworthy LLMs that span eight different dimensions.
Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - A Survey of Safety and Trustworthiness of Large Language Models through
the Lens of Verification and Validation [21.242078120036176]
Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations.
This survey concerns their safety and trustworthiness in industrial applications.
arXiv Detail & Related papers (2023-05-19T02:41:12Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.