The Challenge of Identifying the Origin of Black-Box Large Language Models
- URL: http://arxiv.org/abs/2503.04332v1
- Date: Thu, 06 Mar 2025 11:30:32 GMT
- Title: The Challenge of Identifying the Origin of Black-Box Large Language Models
- Authors: Ziqing Yang, Yixin Wu, Yun Shen, Wei Dai, Michael Backes, Yang Zhang,
- Abstract summary: Third parties can customize large language models (LLMs) through fine-tuning and offer only black-box API access.<n>This practice not only exacerbates unfair competition, but also violates licensing agreements.<n>We propose PlugAE, which proactively plugs adversarial token embeddings into the LLM for tracing and identification.
- Score: 34.284190160785336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The tremendous commercial potential of large language models (LLMs) has heightened concerns about their unauthorized use. Third parties can customize LLMs through fine-tuning and offer only black-box API access, effectively concealing unauthorized usage and complicating external auditing processes. This practice not only exacerbates unfair competition, but also violates licensing agreements. In response, identifying the origin of black-box LLMs is an intrinsic solution to this issue. In this paper, we first reveal the limitations of state-of-the-art passive and proactive identification methods with experiments on 30 LLMs and two real-world black-box APIs. Then, we propose the proactive technique, PlugAE, which optimizes adversarial token embeddings in a continuous space and proactively plugs them into the LLM for tracing and identification. The experiments show that PlugAE can achieve substantial improvement in identifying fine-tuned derivatives. We further advocate for legal frameworks and regulations to better address the challenges posed by the unauthorized use of LLMs.
Related papers
- LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - CALM: Curiosity-Driven Auditing for Large Language Models [27.302357350862085]
We propose Curiosity-Driven Auditing for Large Language Models (CALM) to finetune an LLM as the auditor agent.<n>CALM successfully identifies derogatory completions involving celebrities and uncovers inputs that elicit specific names under the black-box setting.
arXiv Detail & Related papers (2025-01-06T13:14:34Z) - Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models [61.916827858666906]
Large Language Models (LLMs) are increasingly being integrated into services such as ChatGPT to provide responses to user queries.<n>This paper proposes a method called Token Highlighter to inspect and mitigate the potential jailbreak threats in the user query.
arXiv Detail & Related papers (2024-12-24T05:10:02Z) - Matryoshka: Learning to Drive Black-Box LLMs with LLMs [31.501244808646]
Matryoshika is a lightweight white-box large language models controller.
It guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs.
arXiv Detail & Related papers (2024-10-28T05:28:51Z) - From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning [91.79567270986901]
Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses.
Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue.
We propose a novel supervised pinpoint tuning (SPT), where the region-of-interest modules are tuned for a given objective.
arXiv Detail & Related papers (2024-09-03T07:01:37Z) - Jailbreaking Large Language Models Through Alignment Vulnerabilities in Out-of-Distribution Settings [57.136748215262884]
We introduce ObscurePrompt for jailbreaking LLMs, inspired by the observed fragile alignments in Out-of-Distribution (OOD) data.<n>We first formulate the decision boundary in the jailbreaking process and then explore how obscure text affects LLM's ethical decision boundary.<n>Our approach substantially improves upon previous methods in terms of attack effectiveness, maintaining efficacy against two prevalent defense mechanisms.
arXiv Detail & Related papers (2024-06-19T16:09:58Z) - Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries.
We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization.
Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification [41.25887364156612]
We describe the novel fingerprinting problem of Black-box Identity Verification (BBIV)
The goal is to determine whether a third-party application uses a certain LLM through its chat function.
We propose a method called Targeted Random Adversarial Prompt (TRAP) that identifies the specific LLM in use.
arXiv Detail & Related papers (2024-02-20T13:20:39Z) - Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access [14.283269607549892]
We introduce sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox large language models (LLMs)
SGCD operates without access to the logits of the blackbox LLM.
We demonstrate the efficacy of SGCD through experiments in closed information extraction and constituency parsing.
arXiv Detail & Related papers (2024-01-18T13:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.