Probe-Based Interventions for Modifying Agent Behavior
- URL: http://arxiv.org/abs/2201.12938v1
- Date: Wed, 26 Jan 2022 19:14:00 GMT
- Title: Probe-Based Interventions for Modifying Agent Behavior
- Authors: Mycal Tucker, William Kuhl, Khizer Shahid, Seth Karten, Katia Sycara,
and Julie Shah
- Abstract summary: We develop a method for updating representations in pre-trained neural nets according to externally-specified properties.
In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks.
- Score: 4.324022085722613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural nets are powerful function approximators, but the behavior of a given
neural net, once trained, cannot be easily modified. We wish, however, for
people to be able to influence neural agents' actions despite the agents never
training with humans, which we formalize as a human-assisted decision-making
problem. Inspired by prior art initially developed for model explainability, we
develop a method for updating representations in pre-trained neural nets
according to externally-specified properties. In experiments, we show how our
method may be used to improve human-agent team performance for a variety of
neural networks from image classifiers to agents in multi-agent reinforcement
learning settings.
Related papers
- Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Generative Adversarial Neuroevolution for Control Behaviour Imitation [3.04585143845864]
We propose to explore whether deep neuroevolution can be used for behaviour imitation on popular simulation environments.
We introduce a simple co-evolutionary adversarial generation framework, and evaluate its capabilities by evolving standard deep recurrent networks.
Across all tasks, we find the final elite actor agents capable of achieving scores as high as those obtained by the pre-trained agents.
arXiv Detail & Related papers (2023-04-03T16:33:22Z) - NCTV: Neural Clamping Toolkit and Visualization for Neural Network
Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans.
We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z) - Reinforcement Learning in an Adaptable Chess Environment for Detecting
Human-understandable Concepts [0.0]
We show a method for probing which concepts self-learning agents internalise in the course of their training.
For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups.
arXiv Detail & Related papers (2022-11-10T11:48:10Z) - Neuronal Learning Analysis using Cycle-Consistent Adversarial Networks [4.874780144224057]
We use a variant of deep generative models called - CycleGAN, to learn the unknown mapping between pre- and post-learning neural activities.
We develop an end-to-end pipeline to preprocess, train and evaluate calcium fluorescence signals, and a procedure to interpret the resulting deep learning models.
arXiv Detail & Related papers (2021-11-25T13:24:19Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z) - Training spiking neural networks using reinforcement learning [0.0]
We propose biologically-plausible alternatives to backpropagation to facilitate the training of spiking neural networks.
We focus on investigating the candidacy of reinforcement learning rules in solving the spatial and temporal credit assignment problems.
We compare and contrast the two approaches by applying them to traditional RL domains such as gridworld, cartpole and mountain car.
arXiv Detail & Related papers (2020-05-12T17:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.