Related papers: Trust & Safety of LLMs and LLMs in Trust & Safety

Trust & Safety of LLMs and LLMs in Trust & Safety

URL: http://arxiv.org/abs/2412.02113v1
Date: Tue, 03 Dec 2024 03:10:12 GMT
Title: Trust & Safety of LLMs and LLMs in Trust & Safety
Authors: Doohee You, Dan Chon,
Abstract summary: This systematic review investigates the current research landscape on trust and safety in Large Language Models.<n>We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount.<n>This review provides insights on best practices for using LLMs in Trust and Safety, and explores emerging risks such as prompt injection and jailbreak attacks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, Large Language Models (LLMs) have garnered considerable attention for their remarkable abilities in natural language processing tasks. However, their widespread adoption has raised concerns pertaining to trust and safety. This systematic review investigates the current research landscape on trust and safety in LLMs, with a particular focus on the novel application of LLMs within the field of Trust and Safety itself. We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount, offering a consolidated perspective on this emerging trend.\ By synthesizing findings from various studies, we identify key challenges and potential solutions, aiming to benefit researchers and practitioners seeking to understand the nuanced interplay between LLMs and Trust and Safety. This review provides insights on best practices for using LLMs in Trust and Safety, and explores emerging risks such as prompt injection and jailbreak attacks. Ultimately, this study contributes to a deeper understanding of how LLMs can be effectively and responsibly utilized to enhance trust and safety in the digital realm.

Related papers

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment [291.03029298928857]
This paper introduces the concept of "full-stack" safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and commercialization. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems.
arXiv Detail & Related papers (2025-04-22T05:02:49Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs. We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks. In this work, we investigate the interplay between reasoning and safety in LLMs. We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z)
Large Language Model Safety: A Holistic Survey [35.42419096859496]
The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence. This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks.
arXiv Detail & Related papers (2024-12-23T16:11:27Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety. For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context. We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies [43.65655064122938]
Large Language Models (LLMs) agents have evolved to perform complex tasks. The widespread applications of LLM agents demonstrate their significant commercial value. However, they also expose security and privacy vulnerabilities. This survey aims to provide a comprehensive overview of the newly emerged privacy and security issues faced by LLM agents.
arXiv Detail & Related papers (2024-07-28T00:26:24Z)
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices [4.927763944523323]
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP) This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives. The paper recommends promising avenues for future research to enhance the security and risk management of LLMs.
arXiv Detail & Related papers (2024-03-19T07:10:58Z)
TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs) We first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation [21.242078120036176]
Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations. This survey concerns their safety and trustworthiness in industrial applications.
arXiv Detail & Related papers (2023-05-19T02:41:12Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.