Related papers: Thought Flow Nets: From Single Predictions to Trains of Model Thought

Thought Flow Nets: From Single Predictions to Trains of Model Thought

URL: http://arxiv.org/abs/2107.12220v1
Date: Mon, 26 Jul 2021 13:56:37 GMT
Title: Thought Flow Nets: From Single Predictions to Trains of Model Thought
Authors: Hendrik Schuff, Heike Adel, Ngoc Thang Vu
Abstract summary: When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
Score: 39.619001911390804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision, reflect upon it, spot mistakes, resolve contradictions and jump between different hypotheses. Thus, they create a sequence of ideas and follow a train of thought that ultimately reaches a conclusive decision. Contrary to this, today's neural classification models are mostly trained to map an input to one single and fixed output. In this paper, we investigate how we can give models the opportunity of a second, third and $k$-th thought. We take inspiration from Hegel's dialectics and propose a method that turns an existing classifier's class prediction (such as the image class forest) into a sequence of predictions (such as forest $\rightarrow$ tree $\rightarrow$ mushroom). Concretely, we propose a correction module that is trained to estimate the model's correctness as well as an iterative prediction update based on the prediction's gradient. Our approach results in a dynamic system over class probability distributions $\unicode{x2014}$ the thought flow. We evaluate our method on diverse datasets and tasks from computer vision and natural language processing. We observe surprisingly complex but intuitive behavior and demonstrate that our method (i) can correct misclassifications, (ii) strengthens model performance, (iii) is robust to high levels of adversarial attacks, (iv) can increase accuracy up to 4% in a label-distribution-shift setting and (iv) provides a tool for model interpretability that uncovers model knowledge which otherwise remains invisible in a single distribution prediction.

Related papers

Observational Multiplicity [13.001275742432053]
We study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call emphobservational.<n>We introduce a measure of regret for probabilistic classification tasks, which measures how predictions of a model could change as a result of different training labels change.<n>We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret.
arXiv Detail & Related papers (2025-07-30T22:30:56Z)
Reconciling Model Multiplicity for Downstream Decision Making [24.335927243672952]
We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We propose a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction.
arXiv Detail & Related papers (2024-05-30T03:36:46Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Awareness of uncertainty in classification using a multivariate model and multi-views [1.3048920509133808]
The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations. Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions. The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels.
arXiv Detail & Related papers (2024-04-16T06:40:51Z)
ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems [0.0]
ScatterUQ is an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings. We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST. Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets.
arXiv Detail & Related papers (2023-08-08T21:17:03Z)
An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner [8.08640000394814]
We design an intrinsically interpretable model based on RRL(Rule Representation) for the Lending Club dataset. During the training, we learned tricks from previous research to effectively train binary weights. Our model is used to test the correctness of the explanations generated by the post-hoc method.
arXiv Detail & Related papers (2023-04-03T05:55:04Z)
Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z)
Selective Prediction via Training Dynamics [31.708701583736644]
We show that state-of-the-art selective prediction performance can be attained solely from studying the training dynamics of a model.<n>In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training.<n>The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches.
arXiv Detail & Related papers (2022-05-26T17:51:29Z)
Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens. We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability. Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers. This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.