Related papers: Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach

Related papers

Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking [64.97768177044355]
Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems.<n>We present FactArena, a fully automated arena-style evaluation framework.<n>Our analyses reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence.
arXiv Detail & Related papers (2026-01-06T02:51:56Z)
Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models [3.9448289587779404]
Logit-based membership inference attacks (MIAs) have become a widely adopted approach for assessing data exposure in large language models (LLMs)<n>We present the first comprehensive evaluation of extending these text-based MIA methods to multimodal settings.
arXiv Detail & Related papers (2025-12-02T14:11:51Z)
Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks [11.834264748246008]
Large Language Models (LLMs) have become increasingly pervasive, finding applications across many industries and disciplines.<n>In this work, a comprehensive empirical study is conducted to examine the robustness and effectiveness of diverse Uncertainty Estimation measures.
arXiv Detail & Related papers (2025-11-05T04:26:44Z)
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective [13.739343897204568]
Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications.<n>Existing evaluation methods often overlook biases in long-form responses and the intrinsic variability of LLM outputs.<n>We propose FiSco, a novel statistical framework to evaluate group-level fairness in LLMs by detecting subtle semantic differences in long-form responses across demographic groups.
arXiv Detail & Related papers (2025-06-23T18:31:22Z)
Do Large Language Models (Really) Need Statistical Foundations? [1.7741566627076264]
Large language models (LLMs) represent a new paradigm for processing unstructured data.<n>This paper addresses whether the development and application of LLMs would genuinely benefit from statistics contributions.
arXiv Detail & Related papers (2025-05-25T13:44:47Z)
Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation [67.84581846180458]
Session-based recommendation (SBR) predicts the next item based on anonymous sessions. Recent Multimodal SBR methods utilize simplistic pre-trained models for modality learning but have limitations in semantic richness. We propose a multimodal LLM-enhanced framework TPAD, which extends a distillation paradigm to decouple and align transitional patterns for promoting MSBR.
arXiv Detail & Related papers (2025-04-13T07:49:08Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal [21.342265570934995]
Existing methods have largely overlooked the importance of refusal responses as a means of enhancing MLLMs reliability. We present the Information Boundary-aware Learning Framework (InBoL), a novel approach that empowers MLLMs to refuse to answer user queries when encountering insufficient information. This framework introduces a comprehensive data generation pipeline and tailored training strategies to improve the model's ability to deliver appropriate refusal responses.
arXiv Detail & Related papers (2024-12-15T14:17:14Z)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data [64.50893177169996]
Fine-tuning Multimodal Large Language Models (MLLMs) with Federated Learning (FL) allows for expanding the training data scope by including private data sources. We introduce a benchmark for evaluating various downstream tasks in the federated fine-tuning of MLLMs within multimodal heterogeneous scenarios. We develop a general FedMLLM framework that integrates four representative FL methods alongside two modality-agnostic strategies.
arXiv Detail & Related papers (2024-11-22T04:09:23Z)
Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens. Specifically, our framework quantifies the information gain' at each reasoning step, enabling the identification of failure modes. We demonstrate the efficacy of our approach through extensive experiments on toy and GSM-8K data, where it significantly outperforms existing outcome-based methods.
arXiv Detail & Related papers (2024-11-18T19:14:36Z)
A Semiparametric Approach to Causal Inference [2.092897805817524]
In causal inference, an important problem is to quantify the effects of interventions or treatments. In this paper, we employ a semiparametric density ratio model (DRM) to characterize the counterfactual distributions. Our model offers flexibility by avoiding strict parametric assumptions on the counterfactual distributions.
arXiv Detail & Related papers (2024-11-01T18:03:38Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data. Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership. We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory [2.5200794639628032]
EVINCE is a novel framework for optimizing multi-LLM dialogues.<n>It addresses limitations in multi-agent debate (MAS) frameworks.<n>$EVINCE$ emerges as a structured and highly effective framework for multi-LLM collaboration.
arXiv Detail & Related papers (2024-08-26T18:48:51Z)
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models [12.841405829775852]
We introduce the modality importance score (MIS) to identify bias inVidQA benchmarks and datasets.<n>We also propose an innovative method using state-of-the-art MLLMs to estimate the modality importance.<n>Our results indicate that current models do not effectively integrate information due to modality imbalance in existing datasets.
arXiv Detail & Related papers (2024-08-22T23:32:42Z)
TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE. We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks. Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs. We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z)
Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference [0.9898607871253774]
This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs) We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix.
arXiv Detail & Related papers (2024-02-05T16:42:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.