Inferring Capabilities from Task Performance with Bayesian Triangulation
- URL: http://arxiv.org/abs/2309.11975v1
- Date: Thu, 21 Sep 2023 11:19:26 GMT
- Title: Inferring Capabilities from Task Performance with Bayesian Triangulation
- Authors: John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy
Cheke, Jos\'e Hern\'andez-Orallo
- Abstract summary: We describe a method to infer the cognitive profile of a system from diverse experimental data.
These features must be triangulated in complex ways to be able to infer capabilities from non-populational data.
We showcase the potential for capability-oriented evaluation.
- Score: 11.418934051317411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning models become more general, we need to characterise them
in richer, more meaningful ways. We describe a method to infer the cognitive
profile of a system from diverse experimental data. To do so, we introduce
measurement layouts that model how task-instance features interact with system
capabilities to affect performance. These features must be triangulated in
complex ways to be able to infer capabilities from non-populational data -- a
challenge for traditional psychometric and inferential tools. Using the
Bayesian probabilistic programming library PyMC, we infer different cognitive
profiles for agents in two scenarios: 68 actual contestants in the AnimalAI
Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery.
We showcase the potential for capability-oriented evaluation.
Related papers
- Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects.
This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z) - Autonomous Capability Assessment of Sequential Decision-Making Systems
in Stochastic Settings (Extended Version) [27.825419721676766]
It is essential for users to understand what their AI systems can and can't do in order to use them safely.
This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act.
arXiv Detail & Related papers (2023-06-07T22:05:48Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - A Machine Learning Framework for Event Identification via Modal Analysis
of PMU Data [17.105110901241094]
We propose to identify events by extracting features based on modal dynamics.
We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types.
Our results indicate that the proposed framework is promising for identifying the two types of events.
arXiv Detail & Related papers (2022-02-14T16:19:40Z) - Realistic simulation of users for IT systems in cyber ranges [63.20765930558542]
We instrument each machine by means of an external agent to generate user activity.
This agent combines both deterministic and deep learning based methods to adapt to different environment.
We also propose conditional text generation models to facilitate the creation of conversations and documents.
arXiv Detail & Related papers (2021-11-23T10:53:29Z) - A User-Guided Bayesian Framework for Ensemble Feature Selection in Life
Science Applications (UBayFS) [0.0]
We propose UBayFS, an ensemble feature selection technique, embedded in a Bayesian statistical framework.
Our approach enhances the feature selection process by considering two sources of information: data and domain knowledge.
A comparison with standard feature selectors underlines that UBayFS achieves competitive performance, while providing additional flexibility to incorporate domain knowledge.
arXiv Detail & Related papers (2021-04-30T06:51:33Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches.
StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.