Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
- URL: http://arxiv.org/abs/2405.00708v2
- Date: Thu, 07 Aug 2025 14:48:25 GMT
- Title: Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
- Authors: Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Fürst, Hendrik Strobelt, Mennatallah El-Assady,
- Abstract summary: We present an interactive visualization system that enables exploration of large language models (LLMs) through counterfactual analysis.<n>Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals.<n>A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness.
- Score: 22.755345889167934
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
Related papers
- Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning [12.054910727620154]
Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature.<n>This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals.
arXiv Detail & Related papers (2025-07-24T09:49:53Z) - IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles.
Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints.
Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z) - MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)<n>MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.<n>LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z) - Applications of Large Language Model Reasoning in Feature Generation [0.0]
Large Language Models (LLMs) have revolutionized natural language processing through their state of art reasoning capabilities.
This paper explores the convergence of LLM reasoning techniques and feature generation for machine learning tasks.
The paper categorizes LLM-based feature generation methods across various domains including finance, healthcare, and text analytics.
arXiv Detail & Related papers (2025-03-15T04:18:01Z) - Training Large Recommendation Models via Graph-Language Token Alignment [53.3142545812349]
We propose a novel framework to train Large Recommendation models via Graph-Language Token Alignment.
By aligning item and user nodes from the interaction graph with pretrained LLM tokens, GLTA effectively leverages the reasoning abilities of LLMs.
Furthermore, we introduce Graph-Language Logits Matching (GLLM) to optimize token alignment for end-to-end item prediction.
arXiv Detail & Related papers (2025-02-26T02:19:10Z) - Potential and Perils of Large Language Models as Judges of Unstructured Textual Data [0.631976908971572]
This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs.
Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances.
arXiv Detail & Related papers (2025-01-14T14:49:14Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics.
The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Using LLMs for Explaining Sets of Counterfactual Examples to Final Users [0.0]
In automated decision-making scenarios, causal inference methods can analyze the underlying data-generation process.
Counterfactual examples explore hypothetical scenarios where a minimal number of factors are altered.
We propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome.
arXiv Detail & Related papers (2024-08-27T15:13:06Z) - Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs [0.41436032949434404]
We develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets.
We show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue.
arXiv Detail & Related papers (2024-08-19T07:14:15Z) - Leveraging Language Models for Emotion and Behavior Analysis in Education [4.219163079329444]
This paper proposes a novel method leveraging large language models (LLMs) and prompt engineering to analyze textual data from students.
Our approach utilizes tailored prompts to guide LLMs in detecting emotional and engagement states, providing a non-intrusive and scalable solution.
arXiv Detail & Related papers (2024-08-13T13:11:53Z) - From Feature Importance to Natural Language Explanations Using LLMs with RAG [4.204990010424084]
We introduce traceable question-answering, leveraging an external knowledge repository to inform responses of Large Language Models (LLMs)
This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities.
We integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process.
arXiv Detail & Related papers (2024-07-30T17:27:20Z) - CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks.
The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions.
We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z) - Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations.
This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators.
The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z) - Rethinking ChatGPT's Success: Usability and Cognitive Behaviors Enabled by Auto-regressive LLMs' Prompting [5.344199202349884]
We analyze the structure of modalities within both two types of Large Language Models and six task-specific channels during deployment.
We examine the stimulation of diverse cognitive behaviors in LLMs through the adoption of free-form text and verbal contexts.
arXiv Detail & Related papers (2024-05-17T00:19:41Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations [26.340786701393768]
Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding.
Current solutions for dialogue-based explanations, however, often require external tools and modules and are not easily transferable to tasks they were not designed for.
We present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior.
arXiv Detail & Related papers (2024-01-23T09:11:07Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.
We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Competence-Based Analysis of Language Models [21.43498764977656]
CALM (Competence-based Analysis of Language Models) is designed to investigate LLM competence in the context of specific tasks.<n>We develop a new approach for performing causal probing interventions using gradient-based adversarial attacks.<n>We carry out a case study of CALM using these interventions to analyze and compare LLM competence across a variety of lexical inference tasks.
arXiv Detail & Related papers (2023-03-01T08:53:36Z) - Semantic Interactive Learning for Text Classification: A Constructive
Approach for Contextual Interactions [0.0]
We propose a novel interaction framework called Semantic Interactive Learning for the text domain.
We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that enables more semantic alignment between humans and machines.
We introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples.
arXiv Detail & Related papers (2022-09-07T08:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.