Knowledge-Augmented Reasoning for EUAIA Compliance and Adversarial Robustness of LLMs
- URL: http://arxiv.org/abs/2410.09078v1
- Date: Fri, 4 Oct 2024 18:23:14 GMT
- Title: Knowledge-Augmented Reasoning for EUAIA Compliance and Adversarial Robustness of LLMs
- Authors: Tomas Bueno Momcilovic, Dian Balta, Beat Buesser, Giulio Zizzo, Mark Purcell,
- Abstract summary: The EU AI Act (EUAIA) introduces requirements for AI systems which intersect with the processes required to establish adversarial robustness.
This paper presents a functional architecture that focuses on bridging the two properties.
We aim to support developers and auditors with a reasoning layer based on knowledge augmentation.
- Score: 1.368472250332885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The EU AI Act (EUAIA) introduces requirements for AI systems which intersect with the processes required to establish adversarial robustness. However, given the ambiguous language of regulation and the dynamicity of adversarial attacks, developers of systems with highly complex models such as LLMs may find their effort to be duplicated without the assurance of having achieved either compliance or robustness. This paper presents a functional architecture that focuses on bridging the two properties, by introducing components with clear reference to their source. Taking the detection layer recommended by the literature, and the reporting layer required by the law, we aim to support developers and auditors with a reasoning layer based on knowledge augmentation (rules, assurance cases, contextual mappings). Our findings demonstrate a novel direction for ensuring LLMs deployed in the EU are both compliant and adversarially robust, which underpin trustworthiness.
Related papers
- EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.
LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.
We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z) - Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.
They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.
We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z) - Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation [1.368472250332885]
We introduce a novel approach for assurance of large language models (LLMs) based on formal argumentation.
We structure state-of-the-art attacks and defenses, facilitating creation of a human-readable assurance case.
We provide implications for theory and practice, by targeting engineers, data scientists, users and auditors.
arXiv Detail & Related papers (2024-10-10T14:24:43Z) - COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act [40.233017376716305]
The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development.
It lacks clear technical interpretation, making it difficult to assess models' compliance.
This work presents COMPL-AI, a comprehensive framework consisting of the first technical interpretation of the Act.
arXiv Detail & Related papers (2024-10-10T14:23:51Z) - Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs [1.368472250332885]
Large language models are prone to misuse and vulnerable to security threats.
The European Union's Artificial Intelligence Act seeks to enforce AI robustness in certain contexts.
arXiv Detail & Related papers (2024-10-04T18:38:49Z) - Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs [1.368472250332885]
We develop an approach to developing assurance cases for adversarial robustness and regulatory compliance in large language models (LLMs)
We propose a layered framework incorporating guardrails at various stages of deployment, aimed at mitigating these attacks and ensuring compliance with the EU AI Act.
We illustrate our method with two exemplary assurance cases, highlighting how different contexts demand tailored strategies to ensure robust and compliant AI systems.
arXiv Detail & Related papers (2024-10-04T18:14:29Z) - TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE.
We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.