Related papers: Inferring Capabilities from Task Performance with Bayesian Triangulation

Inferring Capabilities from Task Performance with Bayesian Triangulation

URL: http://arxiv.org/abs/2309.11975v1
Date: Thu, 21 Sep 2023 11:19:26 GMT
Title: Inferring Capabilities from Task Performance with Bayesian Triangulation
Authors: John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy Cheke, Jos\'e Hern\'andez-Orallo
Abstract summary: We describe a method to infer the cognitive profile of a system from diverse experimental data. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data. We showcase the potential for capability-oriented evaluation.
Score: 11.418934051317411
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data -- a challenge for traditional psychometric and inferential tools. Using the Bayesian probabilistic programming library PyMC, we infer different cognitive profiles for agents in two scenarios: 68 actual contestants in the AnimalAI Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery. We showcase the potential for capability-oriented evaluation.

Related papers

Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function. We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation [3.8570045844185237]
We present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset. Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems. We evaluate our model's performance on a retrieval task on the MUSDB18 dataset, testing its ability to find the missing stem from a mix.
arXiv Detail & Related papers (2024-08-05T14:34:40Z)
BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. We propose a sample-efficient NS method inspired by Bayesian optimization principles. We show that BEACON comprehensively outperforms existing baselines by finding substantially larger sets of diverse behaviors under limited sampling budgets.
arXiv Detail & Related papers (2024-06-05T20:23:52Z)
Is my Data in your AI Model? Membership Inference Test with Application to Face Images [18.402616111394842]
This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. We propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total.
arXiv Detail & Related papers (2024-02-14T15:09:01Z)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z)
Meta-models for transfer learning in source localisation [3.8922067105369154]
This work looks to capture the interdependencies between acoustic emission (AE) experiments (as meta-models) We utilise a Bayesian multilevel approach where a higher level meta-model captures the inter-task relationships. Key contribution is how knowledge of the experimental campaign can be encoded between tasks as well as within tasks.
arXiv Detail & Related papers (2023-05-15T14:02:35Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Realistic simulation of users for IT systems in cyber ranges [63.20765930558542]
We instrument each machine by means of an external agent to generate user activity. This agent combines both deterministic and deep learning based methods to adapt to different environment. We also propose conditional text generation models to facilitate the creation of conversations and documents.
arXiv Detail & Related papers (2021-11-23T10:53:29Z)
A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS) [0.0]
We propose UBayFS, an ensemble feature selection technique, embedded in a Bayesian statistical framework. Our approach enhances the feature selection process by considering two sources of information: data and domain knowledge. A comparison with standard feature selectors underlines that UBayFS achieves competitive performance, while providing additional flexibility to incorporate domain knowledge.
arXiv Detail & Related papers (2021-04-30T06:51:33Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches. StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.