Related papers: How to select an objective function using information theory

How to select an objective function using information theory

URL: http://arxiv.org/abs/2212.06566v4
Date: Mon, 3 Jun 2024 20:28:06 GMT
Title: How to select an objective function using information theory
Authors: Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall,
Abstract summary: In machine learning or scientific computing, model performance is measured with an objective function. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty) as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.

Related papers

Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities [22.27798386360767]
Large Language Models (LLMs) are increasingly used for knowledge-based reasoning tasks, yet understanding when they rely on genuine knowledge versus superficials remains challenging.<n>We investigate this question through entity comparison tasks by asking models to compare entities along numerical attributes.<n>We identify three biases that strongly influence model predictions: entity popularity, mention order, and semantic co-occurrence.<n>We find that larger models selectively rely on numerical knowledge when it is more reliable, while smaller models show no such discrimination.
arXiv Detail & Related papers (2025-10-19T12:55:30Z)
Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function. We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts [119.22672589020394]
We propose COnfidence-baSed MOdel Selection (CosMoS), where model confidence can effectively guide model selection. We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift.
arXiv Detail & Related papers (2023-06-19T18:48:15Z)
Learning Choice Functions with Gaussian Processes [0.225596179391365]
In consumer theory, ranking available objects by means of preference relations yields the most common description of individual choices. We propose a choice-model which allows an individual to express a set-valued choice.
arXiv Detail & Related papers (2023-02-01T12:46:43Z)
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones. SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders. Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z)
Feature Selection by a Mechanism Design [0.0]
We study the selection problem where the players are the candidates and the payoff function is a performance measurement. In theory, an irrelevant feature is equivalent to a dummy player in the game, which contributes nothing to all modeling situations. In our mechanism design, the end goal perfectly matches the expected model performance with the expected sum of individual marginal effects.
arXiv Detail & Related papers (2021-10-05T23:53:14Z)
Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order. A novel loss function is also proposed to effectively train the saliency ranking branch. experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z)
InfoNCE is variational inference in a recognition parameterised model [32.45282187405337]
We show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO.
arXiv Detail & Related papers (2021-07-06T09:24:57Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Understanding the origin of information-seeking exploration in probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour. One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
Deep Learning for Individual Heterogeneity: An Automatic Inference Framework [2.6813717321945107]
We develop methodology for estimation and inference using machine learning to enrich economic models. We show how to design the network architecture to match the structure of the economic model. We obtain inference based on a novel influence function calculation.
arXiv Detail & Related papers (2020-10-28T01:41:47Z)
Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation. Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission. Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)
Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis [0.0]
We use flip points to explain, audit, and debug deep learning models. A flip point is any point that lies on the boundary between two output classes. We demonstrate our methods by investigating several models trained on standard datasets used in social applications of machine learning.
arXiv Detail & Related papers (2020-01-03T01:45:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.