Related papers: Understanding and Enhancing Robustness of Concept-based Models

Understanding and Enhancing Robustness of Concept-based Models

URL: http://arxiv.org/abs/2211.16080v1
Date: Tue, 29 Nov 2022 10:43:51 GMT
Title: Understanding and Enhancing Robustness of Concept-based Models
Authors: Sanchit Sinha, Mengdi Huai, Jianhui Sun, Aidong Zhang
Abstract summary: We study robustness of concept-based models to adversarial perturbations. In this paper, we first propose and analyze different malicious attacks to evaluate the security vulnerability of concept based models. We then propose a potential general adversarial training-based defense mechanism to increase robustness of these systems to the proposed malicious attacks.
Score: 41.20004311158688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rising usage of deep neural networks to perform decision making in critical applications like medical diagnosis and financial analysis have raised concerns regarding their reliability and trustworthiness. As automated systems become more mainstream, it is important their decisions be transparent, reliable and understandable by humans for better trust and confidence. To this effect, concept-based models such as Concept Bottleneck Models (CBMs) and Self-Explaining Neural Networks (SENN) have been proposed which constrain the latent space of a model to represent high level concepts easily understood by domain experts in the field. Although concept-based models promise a good approach to both increasing explainability and reliability, it is yet to be shown if they demonstrate robustness and output consistent concepts under systematic perturbations to their inputs. To better understand performance of concept-based models on curated malicious samples, in this paper, we aim to study their robustness to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned concept-based model. Specifically, we first propose and analyze different malicious attacks to evaluate the security vulnerability of concept based models. Subsequently, we propose a potential general adversarial training-based defense mechanism to increase robustness of these systems to the proposed malicious attacks. Extensive experiments on one synthetic and two real-world datasets demonstrate the effectiveness of the proposed attacks and the defense approach.

Related papers

What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift [33.83306492023009]
ConceptLens is a generic framework that leverages pre-trained multimodal models to identify integrity threats. It uncovers vulnerabilities to bias injection, such as the generation of covert advertisements through malicious concept shifts. It uncovers sociological biases in generative content, revealing disparities across sociological contexts.
arXiv Detail & Related papers (2025-04-28T13:30:48Z)
A biologically Inspired Trust Model for Open Multi-Agent Systems that is Resilient to Rapid Performance Fluctuations [0.0]
Existing trust models face challenges related to agent mobility, changing behaviors, and the cold start problem. We introduce a biologically inspired trust model in which trustees assess their own capabilities and store trust data locally. This design improves mobility support, reduces communication overhead, resists disinformation, and preserves privacy.
arXiv Detail & Related papers (2025-04-17T08:21:54Z)
Causally Reliable Concept Bottleneck Models [4.411356026951205]
We propose emphCausally reliable Concept Bottleneck Models (C$2$BMs) C$2$BMs enforce reasoning through a bottleneck of concepts structured according to a model of the real-world causal mechanisms. We show that C$2$BMs are more interpretable, causally reliable, and improve responsiveness to interventions w.r.t. standard opaque and concept-based models.
arXiv Detail & Related papers (2025-03-06T12:06:54Z)
Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model [22.865870813626316]
Concept Bottleneck Models (CBMs) aim to enhance interpretability by predicting human-understandable concepts as intermediates for decision-making. Two inherent issues contribute to concept unreliability: sensitivity to concept-irrelevant features and lack of semantic consistency for the same concept across different samples. We propose the Reliability-Enhanced Concept Embedding Model (RECEM), which introduces a two-fold strategy: Concept-Level Disentanglement to separate irrelevant features from concept-relevant information and a Concept Mixup mechanism to ensure semantic alignment across samples.
arXiv Detail & Related papers (2025-02-03T09:29:39Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
A Framework for Strategic Discovery of Credible Neural Network Surrogate Models under Uncertainty [0.0]
This study presents the Occam Plausibility Algorithm for surrogate models (OPAL-surrogate) OPAL-surrogate provides a systematic framework to uncover predictive neural network-based surrogate models. It balances the trade-off between model complexity, accuracy, and prediction uncertainty.
arXiv Detail & Related papers (2024-03-13T18:45:51Z)
NeuralSentinel: Safeguarding Neural Network Reliability and Trustworthiness [0.0]
We present NeuralSentinel (NS), a tool able to validate the reliability and trustworthiness of AI models. NS help non-expert staff increase their confidence in this new system by understanding the model decisions. This tool was deployed and used in a Hackathon event to evaluate the reliability of a skin cancer image detector.
arXiv Detail & Related papers (2024-02-12T09:24:34Z)
Boosting Adversarial Robustness using Feature Level Stochastic Smoothing [46.86097477465267]
adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. In this work, we propose a generic method for introducingity in the network predictions. We also utilize this for smoothing decision rejecting low confidence predictions.
arXiv Detail & Related papers (2023-06-10T15:11:24Z)
Concept Embedding Models [27.968589555078328]
Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. Existing concept bottleneck models are unable to find optimal compromises between high task accuracy, robust concept-based explanations, and effective interventions on concepts. We propose Concept Embedding Models, a novel family of concept bottleneck models which goes beyond the current accuracy-vs-interpretability trade-off by learning interpretable high-dimensional concept representations.
arXiv Detail & Related papers (2022-09-19T14:49:36Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
A Unified Contrastive Energy-based Model for Understanding the Generative Ability of Adversarial Training [64.71254710803368]
Adversarial Training (AT) is an effective approach to enhance the robustness of deep neural networks. We demystify this phenomenon by developing a unified probabilistic framework, called Contrastive Energy-based Models (CEM) We propose a principled method to develop adversarial learning and sampling methods.
arXiv Detail & Related papers (2022-03-25T05:33:34Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
A general framework for defining and optimizing robustness [74.67016173858497]
We propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy. We develop a very general robustness framework that is applicable to any type of classification model.
arXiv Detail & Related papers (2020-06-19T13:24:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.