Self-Attention Between Datapoints: Going Beyond Individual Input-Output
Pairs in Deep Learning
- URL: http://arxiv.org/abs/2106.02584v1
- Date: Fri, 4 Jun 2021 16:30:49 GMT
- Title: Self-Attention Between Datapoints: Going Beyond Individual Input-Output
Pairs in Deep Learning
- Authors: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth,
Yarin Gal
- Abstract summary: We introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.
Our approach uses self-attention to reason about relationships between datapoints explicitly.
Unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction.
- Score: 36.047444794544425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We challenge a common assumption underlying most supervised deep learning:
that a model makes a prediction depending only on its parameters and the
features of a single input. To this end, we introduce a general-purpose deep
learning architecture that takes as input the entire dataset instead of
processing one datapoint at a time. Our approach uses self-attention to reason
about relationships between datapoints explicitly, which can be seen as
realizing non-parametric models using parametric attention mechanisms. However,
unlike conventional non-parametric models, we let the model learn end-to-end
from the data how to make use of other datapoints for prediction. Empirically,
our models solve cross-datapoint lookup and complex reasoning tasks unsolvable
by traditional deep learning models. We show highly competitive results on
tabular data, early results on CIFAR-10, and give insight into how the model
makes use of the interactions between points.
Related papers
- A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.
We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.
After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - EAMDrift: An interpretable self retrain model for time series [0.0]
We present EAMDrift, a novel method that combines forecasts from multiple individual predictors by weighting each prediction according to a performance metric.
EAMDrift is designed to automatically adapt to out-of-distribution patterns in data and identify the most appropriate models to use at each moment.
Our study on real-world datasets shows that EAMDrift outperforms individual baseline models by 20% and achieves comparable accuracy results to non-interpretable ensemble models.
arXiv Detail & Related papers (2023-05-31T13:25:26Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Deep Explainable Learning with Graph Based Data Assessing and Rule
Reasoning [4.369058206183195]
We propose an end-to-end deep explainable learning approach that combines the advantage of deep model in noise handling and expert rule-based interpretability.
The proposed method is tested in an industry production system, showing comparable prediction accuracy, much higher generalization stability and better interpretability.
arXiv Detail & Related papers (2022-11-09T05:58:56Z) - VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data.
We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data.
We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model.
We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information.
Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.