Quantitative Insights into Large Language Model Usage and Trust in Academia: An Empirical Study
- URL: http://arxiv.org/abs/2409.09186v2
- Date: Thu, 06 Feb 2025 23:46:35 GMT
- Title: Quantitative Insights into Large Language Model Usage and Trust in Academia: An Empirical Study
- Authors: Minseok Jung, Aurora Zhang, May Fung, Junho Lee, Paul Pu Liang,
- Abstract summary: Large Language Models (LLMs) are transforming writing, reading, teaching, and knowledge retrieval in many academic fields.<n>There is a pressing need to accurately quantify their usage, user trust in outputs, and concerns about key issues to prioritize in deployment.<n>This study surveyed 125 individuals at a private R1 research university regarding their usage of LLMs, their trust in LLM outputs, and key issues to prioritize for robust usage in academia.
- Score: 27.7299835314702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) are transforming writing, reading, teaching, and knowledge retrieval in many academic fields. However, concerns regarding their misuse and erroneous outputs have led to varying degrees of trust in LLMs within academic communities. In response, various academic organizations have proposed and adopted policies regulating their usage. However, these policies are not based on substantial quantitative evidence because there is no data about use patterns and user opinion. Consequently, there is a pressing need to accurately quantify their usage, user trust in outputs, and concerns about key issues to prioritize in deployment. This study addresses these gaps through a quantitative user study of LLM usage and trust in academic research and education. Specifically, our study surveyed 125 individuals at a private R1 research university regarding their usage of LLMs, their trust in LLM outputs, and key issues to prioritize for robust usage in academia. Our findings reveal: (1) widespread adoption of LLMs, with 75% of respondents actively using them; (2) a significant positive correlation between trust and adoption, as well as between engagement and trust; and (3) that fact-checking is the most critical concern. These findings suggest a need for policies that address pervasive usage, prioritize fact-checking mechanisms, and accurately calibrate user trust levels as they engage with these models. These strategies can help balance innovation with accountability and help integrate LLMs into the academic environment effectively and reliably.
Related papers
- Understanding Knowledge Drift in LLMs through Misinformation [11.605377799885238]
Large Language Models (LLMs) have revolutionized numerous applications, making them an integral part of our digital ecosystem.
We analyze the susceptibility of state-of-the-art LLMs to factual inaccuracies when they encounter false information in a QnA scenario.
Our experiments reveal that an LLM's uncertainty can increase up to 56.6% when the question is answered incorrectly.
arXiv Detail & Related papers (2024-09-11T08:11:16Z) - To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity [27.10502683001428]
This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities.
Experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts.
arXiv Detail & Related papers (2024-07-24T09:48:48Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency [60.25969380388974]
Large Language Models (LLMs) are increasingly explored as knowledge bases (KBs)
Current evaluation methods focus too narrowly on knowledge retention, overlooking other crucial criteria for reliable performance.
We propose new criteria and metrics to quantify factuality and consistency, leading to a final reliability score.
arXiv Detail & Related papers (2024-07-18T15:20:18Z) - Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI [0.3495246564946556]
We introduce a framework for the use of large language models (LLMs) in Building Understandable Messaging for Policy and Evidence Review (BUMPER)
LLMs are capable of providing interfaces for understanding and synthesizing large databases of diverse media.
We argue that this framework can facilitate accessibility of and confidence in scientific evidence for policymakers.
arXiv Detail & Related papers (2024-06-27T05:03:03Z) - I don't trust you (anymore)! -- The effect of students' LLM use on Lecturer-Student-Trust in Higher Education [0.0]
Large Language Models (LLMs) in platforms like Open AI's ChatGPT, has led to their rapid adoption among university students.
This study addresses the research question: How does the use of LLMs by students impact Informational and Procedural Justice, influencing Team Trust and Expected Team Performance?
Our findings indicate that lecturers are less concerned about the fairness of LLM use per se but are more focused on the transparency of student utilization.
arXiv Detail & Related papers (2024-06-21T05:35:57Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Multimodal Large Language Models to Support Real-World Fact-Checking [80.41047725487645]
Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information.
While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied.
We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
arXiv Detail & Related papers (2024-03-06T11:32:41Z) - TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications.
This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z) - Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap.
LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data.
COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Best Practices for Text Annotation with Large Language Models [11.421942894219901]
Large Language Models (LLMs) have ushered in a new era of text annotation.
This paper proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use.
arXiv Detail & Related papers (2024-02-05T15:43:50Z) - The Calibration Gap between Model and Human Confidence in Large Language
Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct.
Recent work has focused on the quality of internal LLM confidence assessments.
This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z) - TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs)
We first propose a set of principles for trustworthy LLMs that span eight different dimensions.
Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z) - Empirical evaluation of Uncertainty Quantification in
Retrieval-Augmented Language Models for Science [0.0]
This study investigates how uncertainty scores vary when scientific knowledge is incorporated as pretraining and retrieval data.
We observe that an existing RALM finetuned with scientific knowledge as the retrieval data tends to be more confident in generating predictions.
We also found that RALMs are overconfident in their predictions, making inaccurate predictions more confidently than accurate ones.
arXiv Detail & Related papers (2023-11-15T20:42:11Z) - The Perils & Promises of Fact-checking with Large Language Models [55.869584426820715]
Large Language Models (LLMs) are increasingly trusted to write academic papers, lawsuits, and news articles.
We evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions.
Our results show the enhanced prowess of LLMs when equipped with contextual information.
While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy.
arXiv Detail & Related papers (2023-10-20T14:49:47Z) - Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks.
How do we evaluate the capabilities of LLMs to consistently produce factually correct answers?
We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z) - FELM: Benchmarking Factuality Evaluation of Large Language Models [40.78878196872095]
We introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm.
We collect responses generated from large language models and annotate factuality labels in a fine-grained manner.
Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.
arXiv Detail & Related papers (2023-10-01T17:37:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.