Trustworthy Large Models in Vision: A Survey
- URL: http://arxiv.org/abs/2311.09680v5
- Date: Thu, 1 Feb 2024 05:15:52 GMT
- Title: Trustworthy Large Models in Vision: A Survey
- Authors: Ziyan Guo and Li Xu and Jun Liu
- Abstract summary: Large Models (LMs) have revolutionized various fields of deep learning, including Natural Language Processing (NLP) to Computer Vision (CV)
LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior.
We summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability.
We hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.
- Score: 8.566163225282724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of Large Models (LMs) has recently revolutionized various
fields of deep learning with remarkable grades, ranging from Natural Language
Processing (NLP) to Computer Vision (CV). However, LMs are increasingly
challenged and criticized by academia and industry due to their powerful
performance but untrustworthy behavior, which urgently needs to be alleviated
by reliable methods. Despite the abundance of literature on trustworthy LMs in
NLP, a systematic survey specifically delving into the trustworthiness of LMs
in CV remains absent. In order to mitigate this gap, we summarize four relevant
concerns that obstruct the trustworthy usage in vision of LMs in this survey,
including 1) human misuse, 2) vulnerability, 3) inherent issue and 4)
interpretability. By highlighting corresponding challenge, countermeasures, and
discussion in each topic, we hope this survey will facilitate readers'
understanding of this field, promote alignment of LMs with human expectations
and enable trustworthy LMs to serve as welfare rather than disaster for human
society.
Related papers
- Belief in the Machine: Investigating Epistemological Blind Spots of Language Models [51.63547465454027]
Language models (LMs) are essential for reliable decision-making in fields like healthcare, law, and journalism.
This study systematically evaluates the capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE.
Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios.
Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data.
arXiv Detail & Related papers (2024-10-28T16:38:20Z) - A Survey on the Honesty of Large Language Models [115.8458596738659]
Honesty is a fundamental principle for aligning large language models (LLMs) with human values.
Despite promising, current LLMs still exhibit significant dishonest behaviors.
arXiv Detail & Related papers (2024-09-27T14:34:54Z) - Quantitative Insights into Language Model Usage and Trust in Academia: An Empirical Study [29.750000639372203]
There is a notable gap in quantitative evidence regarding the extent of LM usage, user trust in their outputs, and issues to prioritize for real-world development.
This study surveyed 125 individuals at a private school and secured 88 data points after pre-processing.
Through both quantitative analysis and qualitative evidence, we found a significant variation in trust levels, which are strongly related to usage time and frequency.
arXiv Detail & Related papers (2024-09-13T20:45:50Z) - BeHonest: Benchmarking Honesty in Large Language Models [23.192389530727713]
We introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in Large Language Models.
BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses.
Our findings indicate that there is still significant room for improvement in the honesty of LLMs.
arXiv Detail & Related papers (2024-06-19T06:46:59Z) - Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.
Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.
Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z) - Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models [6.9060054915724]
Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial.
This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting.
We propose the new Japanese Uncertain Scenes dataset aimed at testing VLM capabilities via difficult queries and object counting, and the Net Error dataset to measure direction of miscalibration.
arXiv Detail & Related papers (2024-05-05T12:51:38Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z) - A Survey of Confidence Estimation and Calibration in Large Language Models [86.692994151323]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains.
Despite their impressive performance, they can be unreliable due to factual errors in their generations.
Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.
arXiv Detail & Related papers (2023-11-14T16:43:29Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.