Related papers: Digging Into the Internal: Causality-Based Analysis of LLM Function Calling

Digging Into the Internal: Causality-Based Analysis of LLM Function Calling

URL: http://arxiv.org/abs/2509.16268v1
Date: Thu, 18 Sep 2025 08:30:26 GMT
Title: Digging Into the Internal: Causality-Based Analysis of LLM Function Calling
Authors: Zhenlan Ji, Daoyuan Wu, Wenxuan Wang, Pingchuan Ma, Shuai Wang, Lei Ma,
Abstract summary: We show that Function calling (FC) can substantially enhance the compliance of large language models with user instructions.<n>We conduct experiments comparing the effectiveness of FC-based instructions against conventional prompting methods.<n>FC shows an average performance improvement of around 135% over conventional prompting methods in detecting malicious inputs.
Score: 20.565096639708162
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Function calling (FC) has emerged as a powerful technique for facilitating large language models (LLMs) to interact with external systems and perform structured tasks. However, the mechanisms through which it influences model behavior remain largely under-explored. Besides, we discover that in addition to the regular usage of FC, this technique can substantially enhance the compliance of LLMs with user instructions. These observations motivate us to leverage causality, a canonical analysis method, to investigate how FC works within LLMs. In particular, we conduct layer-level and token-level causal interventions to dissect FC's impact on the model's internal computational logic when responding to user queries. Our analysis confirms the substantial influence of FC and reveals several in-depth insights into its mechanisms. To further validate our findings, we conduct extensive experiments comparing the effectiveness of FC-based instructions against conventional prompting methods. We focus on enhancing LLM safety robustness, a critical LLM application scenario, and evaluate four mainstream LLMs across two benchmark datasets. The results are striking: FC shows an average performance improvement of around 135% over conventional prompting methods in detecting malicious inputs, demonstrating its promising potential to enhance LLM reliability and capability in practical applications.

Related papers

Factors That Support Grounded Responses in LLM Conversations: A Rapid Review [0.0]
Large language models (LLMs) may generate outputs that are misaligned with user intent, lack contextual grounding, or exhibit hallucinations during conversation.<n>This review aimed to identify and analyze techniques that align LLM responses with conversational goals, ensure grounding, and reduce hallucination and topic drift.
arXiv Detail & Related papers (2025-11-24T21:18:46Z)
HatLLM: Hierarchical Attention Masking for Enhanced Collaborative Modeling in LLM-based Recommendation [17.271853114690902]
HatLLM is a hierarchical attention masking strategy for large language models (LLMs) for sequential recommendation.<n>HatLLM achieves significant performance gains (9.13% on average) over existing LLM-based methods.
arXiv Detail & Related papers (2025-10-13T03:05:03Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization [45.799380822683034]
We present an extensive study aimed at advancing RL-based finetuning techniques for Large Language Models (LLMs)<n>We highlight key limitations of commonly adopted LLMs, such as their tendency to over-predict certain types of vulnerabilities while failing to detect others.<n>To address this challenge, we explore the use of Group Relative Policy Optimization (GRPO), a recent policy-gradient method, for guiding LLM behavior through structured, rule-based rewards.
arXiv Detail & Related papers (2025-07-03T11:52:45Z)
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks [61.06621533874629]
In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs)<n>In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts.<n>Motivated by these findings, we propose ICLShield, a defense mechanism that dynamically adjusts the concept preference ratio.
arXiv Detail & Related papers (2025-07-02T03:09:20Z)
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models [37.476241509187304]
Large Language Models (LLMs) achieve remarkable performance through pretraining on extensive data.<n>The lack of interpretability in their underlying mechanisms limits the ability to effectively steer LLMs for specific applications.<n>We introduce an efficient selective layer intervention based on prominent parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-10-23T09:40:15Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Factual consistency evaluation of summarization in the Era of large language models [34.5486469154385]
Existing factual consistency metrics are constrained by their performance, efficiency, and explainability.<n>Recent advances in Large language models (LLMs) have demonstrated remarkable potential in text evaluation.<n>However, their effectiveness in assessing factual consistency remains underexplored.
arXiv Detail & Related papers (2024-02-21T12:35:19Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.