Thought Flow Nets: From Single Predictions to Trains of Model Thought
- URL: http://arxiv.org/abs/2107.12220v1
- Date: Mon, 26 Jul 2021 13:56:37 GMT
- Title: Thought Flow Nets: From Single Predictions to Trains of Model Thought
- Authors: Hendrik Schuff, Heike Adel, Ngoc Thang Vu
- Abstract summary: When humans solve complex problems, they rarely come up with a decision right-away.
Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
- Score: 39.619001911390804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When humans solve complex problems, they rarely come up with a decision
right-away. Instead, they start with an intuitive decision, reflect upon it,
spot mistakes, resolve contradictions and jump between different hypotheses.
Thus, they create a sequence of ideas and follow a train of thought that
ultimately reaches a conclusive decision. Contrary to this, today's neural
classification models are mostly trained to map an input to one single and
fixed output. In this paper, we investigate how we can give models the
opportunity of a second, third and $k$-th thought. We take inspiration from
Hegel's dialectics and propose a method that turns an existing classifier's
class prediction (such as the image class forest) into a sequence of
predictions (such as forest $\rightarrow$ tree $\rightarrow$ mushroom).
Concretely, we propose a correction module that is trained to estimate the
model's correctness as well as an iterative prediction update based on the
prediction's gradient. Our approach results in a dynamic system over class
probability distributions $\unicode{x2014}$ the thought flow. We evaluate our
method on diverse datasets and tasks from computer vision and natural language
processing. We observe surprisingly complex but intuitive behavior and
demonstrate that our method (i) can correct misclassifications, (ii)
strengthens model performance, (iii) is robust to high levels of adversarial
attacks, (iv) can increase accuracy up to 4% in a label-distribution-shift
setting and (iv) provides a tool for model interpretability that uncovers model
knowledge which otherwise remains invisible in a single distribution
prediction.
Related papers
- Reconciling Model Multiplicity for Downstream Decision Making [24.335927243672952]
We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population.
We propose a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction.
arXiv Detail & Related papers (2024-05-30T03:36:46Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Awareness of uncertainty in classification using a multivariate model and multi-views [1.3048920509133808]
The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations.
Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions.
The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels.
arXiv Detail & Related papers (2024-04-16T06:40:51Z) - ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems [0.0]
ScatterUQ is an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings.
We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST.
Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets.
arXiv Detail & Related papers (2023-08-08T21:17:03Z) - An Interpretable Loan Credit Evaluation Method Based on Rule
Representation Learner [8.08640000394814]
We design an intrinsically interpretable model based on RRL(Rule Representation) for the Lending Club dataset.
During the training, we learned tricks from previous research to effectively train binary weights.
Our model is used to test the correctness of the explanations generated by the post-hoc method.
arXiv Detail & Related papers (2023-04-03T05:55:04Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - How do Decisions Emerge across Layers in Neural Models? Interpretation
with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability.
Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers.
This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.