Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps
- URL: http://arxiv.org/abs/2501.18712v4
- Date: Fri, 07 Feb 2025 14:14:07 GMT
- Title: Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps
- Authors: Devansh Bhardwaj, Naman Mishra,
- Abstract summary: Fingerprinting of Large Language Models (LLMs) has become essential for ensuring the security and transparency of AI-integrated applications.
We introduce a novel fingerprinting framework designed to address these challenges by integrating static and dynamic fingerprinting techniques.
Our approach identifies architectural features and behavioral traits, enabling accurate and robust fingerprinting of LLMs in dynamic environments.
- Score: 0.0
- License:
- Abstract: Fingerprinting refers to the process of identifying underlying Machine Learning (ML) models of AI Systemts, such as Large Language Models (LLMs), by analyzing their unique characteristics or patterns, much like a human fingerprint. The fingerprinting of Large Language Models (LLMs) has become essential for ensuring the security and transparency of AI-integrated applications. While existing methods primarily rely on access to direct interactions with the application to infer model identity, they often fail in real-world scenarios involving multi-agent systems, frequent model updates, and restricted access to model internals. In this paper, we introduce a novel fingerprinting framework designed to address these challenges by integrating static and dynamic fingerprinting techniques. Our approach identifies architectural features and behavioral traits, enabling accurate and robust fingerprinting of LLMs in dynamic environments. We also highlight new threat scenarios where traditional fingerprinting methods are ineffective, bridging the gap between theoretical techniques and practical application. To validate our framework, we present an extensive evaluation setup that simulates real-world conditions and demonstrate the effectiveness of our methods in identifying and monitoring LLMs in Gen-AI applications. Our results highlight the framework's adaptability to diverse and evolving deployment contexts.
Related papers
- Creating an LLM-based AI-agent: A high-level methodology towards enhancing LLMs with APIs [0.0]
Large Language Models (LLMs) have revolutionized various aspects of engineering and science.
This thesis serves as a comprehensive guide that elucidates a multi-faceted approach for empowering LLMs with the capability to leverage Application Programming Interfaces (APIs)
We propose an on-device architecture that aims to exploit the functionality of carry-on devices by using small models from the Hugging Face community.
arXiv Detail & Related papers (2024-12-17T14:14:04Z) - GeMID: Generalizable Models for IoT Device Identification [4.029017464832905]
Device identification (DI) distinguishes IoT devices based on their traffic patterns.
Existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments.
We propose a novel framework to address this limitation and evaluate the generalizability of DI models across datasets collected within different network environments.
arXiv Detail & Related papers (2024-11-05T17:09:43Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.
We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data.
Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - MRA-GNN: Minutiae Relation-Aware Model over Graph Neural Network for
Fingerprint Embedding [4.5262471547727845]
We propose a novel paradigm for fingerprint embedding, called Minutiae Relation-Aware model over Graph Neural Network (MRA-GNN)
Our proposed approach incorporates a GNN-based framework in fingerprint embedding to encode the topology and correlation of fingerprints into descriptive features.
We equip MRA-GNN with a Topological relation Reasoning Module (TRM) and Correlation-Aware Module (CAM) to learn the fingerprint embedding from these graphs successfully.
arXiv Detail & Related papers (2023-07-31T05:54:06Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Learning Robust Representations Of Generative Models Using Set-Based
Artificial Fingerprints [14.191129493685212]
Existing methods approximate the distance between the models via their sample distributions.
We consider unique traces (a.k.a. "artificial fingerprints") as representations of generative models.
We propose a new learning method based on set-encoding and contrastive training.
arXiv Detail & Related papers (2022-06-04T23:20:07Z) - Fingerprint recognition with embedded presentation attacks detection:
are we ready? [6.0168714922994075]
The diffusion of fingerprint verification systems for security applications makes it urgent to investigate the embedding of software-based presentation attack algorithms (PAD) into such systems.
Current research did not state much about their effectiveness when embedded in fingerprint verification systems.
This paper proposes a performance simulator based on the probabilistic modeling of the relationships among the Receiver Operating Characteristics (ROC) of the two individual systems when PAD and verification stages are implemented sequentially.
arXiv Detail & Related papers (2021-10-20T13:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.