Related papers: Shh, don't say that! Domain Certification in LLMs

Shh, don't say that! Domain Certification in LLMs

URL: http://arxiv.org/abs/2502.19320v2
Date: Thu, 06 Mar 2025 21:49:11 GMT
Title: Shh, don't say that! Domain Certification in LLMs
Authors: Cornelius Emde, Alasdair Paren, Preetham Arvind, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Bernard Ghanem, Philip H. S. Torr, Adel Bibi,
Abstract summary: Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains.<n>We introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models.<n>We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate.
Score: 124.61851324874627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains. For example, customer support bots can be built on top of LLMs, relying on their broad language understanding and capabilities to enhance performance. However, these LLMs are adversarially susceptible, potentially generating outputs outside the intended domain. To formalize, assess, and mitigate this risk, we introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models. We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate. Finally, we evaluate our method across a diverse set of datasets, demonstrating that it yields meaningful certificates, which bound the probability of out-of-domain samples tightly with minimum penalty to refusal behavior.

Related papers

RvLLM: LLM Runtime Verification with Domain Knowledge [8.15645390408007]
Large language models (LLMs) have emerged as a dominant AI paradigm due to their exceptional text understanding and generation capabilities.<n>Their tendency to generate inconsistent or erroneous outputs challenges their reliability, especially in high-stakes domains requiring accuracy and trustworthiness.<n>Existing research primarily focuses on detecting and mitigating model misbehavior in general-purpose scenarios, often overlooking the potential of integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-05-24T08:21:44Z)
Exploring How LLMs Capture and Represent Domain-Specific Knowledge [16.84031546207366]
We study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains.
arXiv Detail & Related papers (2025-04-23T16:46:06Z)
Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation [36.41708236431343]
Large language models (LLMs) have been increasingly adopted for machine translation (MT) Our work studies domain-adapted MT with LLMs through a careful prompting setup. We find that demonstrations consistently outperform terminology, and retrieval consistently outperforms generation.
arXiv Detail & Related papers (2025-03-06T22:23:07Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
BANER: Boundary-Aware LLMs for Few-Shot Named Entity Recognition [12.57768435856206]
We propose an approach called Boundary-Aware LLMs for Few-Shot Named Entity Recognition.<n>We introduce a boundary-aware contrastive learning strategy to enhance the LLM's ability to perceive entity boundaries for generalized entity spans.<n>We utilize LoRAHub to align information from the target domain to the source domain, thereby enhancing adaptive cross-domain classification capabilities.
arXiv Detail & Related papers (2024-12-03T07:51:14Z)
SVIP: Towards Verifiable Inference of Open-source Large Language Models [33.910670775972335]
Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. Their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings.
arXiv Detail & Related papers (2024-10-29T17:52:45Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL) In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain. We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z)
Domain Private Transformers for Multi-Domain Dialog Systems [2.7013801448234367]
This paper proposes domain privacy as a novel way to quantify how likely a conditional language model will leak across domains. Experiments on membership inference attacks show that our proposed method has comparable resiliency to methods adapted from recent literature on differentially private language models.
arXiv Detail & Related papers (2023-05-23T16:27:12Z)
VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding [5.1282202633907]
We propose a novel Transformer-based language model named VarMAE for domain-adaptive language understanding. Under the masked autoencoding objective, we design a context uncertainty learning module to encode the token's context into a smooth latent distribution. Experiments on science- and finance-domain NLU tasks demonstrate that VarMAE can be efficiently adapted to new domains with limited resources.
arXiv Detail & Related papers (2022-11-01T12:51:51Z)
KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs) Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge. Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.