Related papers: Learning to Extract Context for Context-Aware LLM Inference

Learning to Extract Context for Context-Aware LLM Inference

URL: http://arxiv.org/abs/2512.11986v1
Date: Fri, 12 Dec 2025 19:10:08 GMT
Title: Learning to Extract Context for Context-Aware LLM Inference
Authors: Minseon Kim, Lucas Caccia, Zhengyan Shi, Matheus Pereira, Marc-Alexandre Côté, Xingdi Yuan, Alessandro Sordoni,
Abstract summary: User prompts to large language models (LLMs) are often ambiguous or under-specified.<n> contextual cues shaped by user intentions, prior knowledge, and risk factors influence what constitutes an appropriate response.<n>We propose a framework that extracts and leverages such contextual information from the user prompt itself.
Score: 60.376872353918394
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: User prompts to large language models (LLMs) are often ambiguous or under-specified, and subtle contextual cues shaped by user intentions, prior knowledge, and risk factors strongly influence what constitutes an appropriate response. Misinterpreting intent or risks may lead to unsafe outputs, while overly cautious interpretations can cause unnecessary refusal of benign requests. In this paper, we question the conventional framework in which LLMs generate immediate responses to requests without considering broader contextual factors. User requests are situated within broader contexts such as intentions, knowledge, and prior experience, which strongly influence what constitutes an appropriate answer. We propose a framework that extracts and leverages such contextual information from the user prompt itself. Specifically, a reinforcement learning based context generator, designed in an autoencoder-like fashion, is trained to infer contextual signals grounded in the prompt and use them to guide response generation. This approach is particularly important for safety tasks, where ambiguous requests may bypass safeguards while benign but confusing requests can trigger unnecessary refusals. Experiments show that our method reduces harmful responses by an average of 5.6% on the SafetyInstruct dataset across multiple foundation models and improves the harmonic mean of attack success rate and compliance on benign prompts by 6.2% on XSTest and WildJailbreak. These results demonstrate the effectiveness of context extraction for safer and more reliable LLM inferences.

Related papers

Context Dependence and Reliability in Autoregressive Language Models [4.9988239650406765]
In critical applications, it is vital to identify which context elements actually influence the output.<n>This work addresses the challenge of distinguishing essential context elements from correlated ones.<n>We introduce RISE, a method that quantifies the unique influence of each input relative to others, minimizing the impact of redundancies.
arXiv Detail & Related papers (2026-02-01T18:25:44Z)
Reasoning About Intent for Ambiguous Requests [47.979705857002415]
We propose generating multiple interpretation-answer pairs in a single structured response to ambiguous requests.<n>Our models are trained with reinforcement learning and customized reward functions using multiple valid answers as supervision.
arXiv Detail & Related papers (2025-11-13T16:18:45Z)
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts [55.70338710797578]
We introduce the Poisoned Context Testbed, pairing queries with real-world contexts containing relevant and inappropriate content.<n>Inspired by associative learning in animals, we adapt the Rescorla-Wagner (RW) model from neuroscience to quantify how competing contextual signals influence LLM outputs.<n>We introduce RW-Steering, a two-stage finetuning-based approach that enables the model to internally identify and ignore inappropriate signals.
arXiv Detail & Related papers (2025-09-02T00:40:34Z)
Highlight & Summarize: RAG without the jailbreaks [13.121045036871607]
Malicious users can input specially crafted prompts to cause the Large Language Models to generate undesirable content or perform a completely different task from its intended purpose.<n>We present and evaluate Highlight & Summarize (H&S), a new design pattern for retrieval-augmented generation (RAG) systems that prevents these attacks by design.
arXiv Detail & Related papers (2025-08-04T20:01:00Z)
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning [12.467239356591238]
FalseReject is a comprehensive resource containing 16k seemingly toxic queries accompanied by structured responses across 44 safety-related categories.<n>We propose a graph-informed adversarial multi-agent interaction framework to generate diverse and complex prompts.<n>We show that supervised finetuning with FalseReject substantially reduces unnecessary refusals without compromising overall safety or general language capabilities.
arXiv Detail & Related papers (2025-05-12T20:45:25Z)
Safety is Not Only About Refusal: Reasoning-Enhanced Fine-tuning for Interpretable LLM Safety [41.32331563680919]
Large Language Models (LLMs) are vulnerable to jailbreak attacks that exploit weaknesses in traditional safety alignment.<n>We propose Reasoning-enhanced Finetuning for interpretable LLM Safety (Rational)<n>Rational trains models to engage in explicit safe reasoning before response.
arXiv Detail & Related papers (2025-03-06T22:47:45Z)
HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router [42.222681564769076]
We introduce HiddenGuard, a novel framework for fine-grained, safe generation in Large Language Models. HiddenGuard incorporates Prism, which operates alongside the LLM to enable real-time, token-level detection and redaction of harmful content. Our experiments demonstrate that HiddenGuard achieves over 90% in F1 score for detecting and redacting harmful content.
arXiv Detail & Related papers (2024-10-03T17:10:41Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.<n>This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.<n>We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
On Prompt-Driven Safeguarding for Large Language Models [172.13943777203377]
We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction. Inspired by these findings, we propose a method for safety prompt optimization, namely DRO. Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness.
arXiv Detail & Related papers (2024-01-31T17:28:24Z)
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following. This capability brings with it the risk of prompt injection attacks. We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.