Related papers: Probe-Based Interventions for Modifying Agent Behavior

Probe-Based Interventions for Modifying Agent Behavior

URL: http://arxiv.org/abs/2201.12938v1
Date: Wed, 26 Jan 2022 19:14:00 GMT
Title: Probe-Based Interventions for Modifying Agent Behavior
Authors: Mycal Tucker, William Kuhl, Khizer Shahid, Seth Karten, Katia Sycara, and Julie Shah
Abstract summary: We develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks.
Score: 4.324022085722613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks from image classifiers to agents in multi-agent reinforcement learning settings.

Related papers

Mapping Neural Signals to Agent Performance, A Step Towards Reinforcement Learning from Neural Feedback [2.9060647847644985]
We introduce NEURO-LOOP, an implicit feedback framework that utilizes the intrinsic human reward system to drive human-agent interaction.<n>This work demonstrates the feasibility of a critical first step in the NEURO-LOOP framework: mapping brain signals to agent performance.<n>We conclude that a relationship between fNIRS data and agent performance exists using classical machine learning techniques.
arXiv Detail & Related papers (2025-06-14T21:38:31Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
Generative Adversarial Neuroevolution for Control Behaviour Imitation [3.04585143845864]
We propose to explore whether deep neuroevolution can be used for behaviour imitation on popular simulation environments. We introduce a simple co-evolutionary adversarial generation framework, and evaluate its capabilities by evolving standard deep recurrent networks. Across all tasks, we find the final elite actor agents capable of achieving scores as high as those obtained by the pre-trained agents.
arXiv Detail & Related papers (2023-04-03T16:33:22Z)
NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans. We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z)
Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts [0.0]
We show a method for probing which concepts self-learning agents internalise in the course of their training. For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups.
arXiv Detail & Related papers (2022-11-10T11:48:10Z)
Neuronal Learning Analysis using Cycle-Consistent Adversarial Networks [4.874780144224057]
We use a variant of deep generative models called - CycleGAN, to learn the unknown mapping between pre- and post-learning neural activities. We develop an end-to-end pipeline to preprocess, train and evaluate calcium fluorescence signals, and a procedure to interpret the resulting deep learning models.
arXiv Detail & Related papers (2021-11-25T13:24:19Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Backprop-Free Reinforcement Learning with Active Neural Generative Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments. We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference. The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z)
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
Training spiking neural networks using reinforcement learning [0.0]
We propose biologically-plausible alternatives to backpropagation to facilitate the training of spiking neural networks. We focus on investigating the candidacy of reinforcement learning rules in solving the spatial and temporal credit assignment problems. We compare and contrast the two approaches by applying them to traditional RL domains such as gridworld, cartpole and mountain car.
arXiv Detail & Related papers (2020-05-12T17:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.