Related papers: LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

Related papers

LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High [7.9042053398943075]
Presuppositions subtly introduce information as given, making them highly effective at embedding disputable or false information.<n>This raises concerns about whether LLMs, like humans, may fail to detect and correct misleading assumptions introduced as false presuppositions.
arXiv Detail & Related papers (2025-05-28T13:35:07Z)
MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z)
Is Small Language Model the Silver Bullet to Low-Resource Languages Machine Translation? [18.975160361440597]
Low-resource languages (LRLs) lack sufficient linguistic resources and are underrepresented in benchmark datasets.<n>This study systematically evaluates state-of-the-art smaller Large Language Models in 200 languages.
arXiv Detail & Related papers (2025-03-31T13:56:03Z)
Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information [0.0]
The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking. We use AI auditing methodology that systematically evaluates performance of five LLMs. The results indicate that models are better at identifying false statements, especially on sensitive topics.
arXiv Detail & Related papers (2025-03-11T13:06:40Z)
Evaluating open-source Large Language Models for automated fact-checking [0.13499500088995461]
Large Language Models (LLMs) have emerged as potential tools for automated fact-checking. This study focuses on their ability to assess claims with different levels of contextual information.
arXiv Detail & Related papers (2025-03-07T16:45:33Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning. We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics. Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
LIBRA: Measuring Bias of Large Language Model from a Local Context [9.612845616659776]
Large Language Models (LLMs) have significantly advanced natural language processing applications. Yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups. This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias.
arXiv Detail & Related papers (2025-02-02T04:24:57Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness [30.632260870411177]
Large language models (LLMs) have rapidly penetrated into people's work and daily lives over the past few years. This thesis focuses on the correctness, non-toxicity, and fairness of LLMs from both software testing and natural language processing perspectives.
arXiv Detail & Related papers (2024-08-31T22:21:04Z)
Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification [5.8229466650067065]
We investigate whether large language models (LLMs) exhibit a tendency to more accurately classify politically-charged stances. Our findings reveal a statistically significant difference in the performance of LLMs across various politically oriented stance classification tasks. LLMs have poorer stance classification accuracy when there is greater ambiguity in the target the statement is directed towards.
arXiv Detail & Related papers (2024-07-25T01:11:38Z)
Modulating Language Model Experiences through Frictions [56.17593192325438]
Over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse.
arXiv Detail & Related papers (2024-06-24T16:31:11Z)
The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM) We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions. Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Are Large Language Models Good Fact Checkers: A Preliminary Study [26.023148371263012]
Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository. This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks.
arXiv Detail & Related papers (2023-11-29T05:04:52Z)
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities. This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z)
Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs. Even GPT-3.5 produces factual outputs less than 25% of the time. This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks. This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.