A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
- URL: http://arxiv.org/abs/2501.09431v1
- Date: Thu, 16 Jan 2025 09:59:45 GMT
- Title: A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
- Authors: Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li,
- Abstract summary: Large language models (LLMs) present significant potential for supporting numerous real-world applications.
They still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment.
- Score: 31.839815402460918
- License:
- Abstract: While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
Related papers
- Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM [53.79753074854936]
Large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks.
This vulnerability poses significant risks to the real-world applications.
We propose a novel defensive paradigm called GuidelineLLM.
arXiv Detail & Related papers (2024-12-10T12:42:33Z) - A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions [9.045698110081686]
Large language models (LLMs) generate plausible, factually-incorrect responses, which are expressed with striking confidence.
Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt.
This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses.
arXiv Detail & Related papers (2024-12-07T06:56:01Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount.
These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm.
This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs.
Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods.
This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z) - Security and Privacy Challenges of Large Language Models: A Survey [2.6986500640871482]
Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering.
These models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks.
This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare.
arXiv Detail & Related papers (2024-01-30T04:00:54Z) - Walking a Tightrope -- Evaluating Large Language Models in High-Risk
Domains [15.320563604087246]
High-risk domains pose unique challenges that require language models to provide accurate and safe responses.
Despite the great success of large language models (LLMs), their performance in high-risk domains remains unclear.
arXiv Detail & Related papers (2023-11-25T08:58:07Z) - Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment [35.42539816648068]
This paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing large language models (LLMs)
The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness.
Results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness.
arXiv Detail & Related papers (2023-08-10T06:43:44Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.