Understanding Text Classification Data and Models Using Aggregated Input
Salience
- URL: http://arxiv.org/abs/2211.05485v2
- Date: Fri, 11 Nov 2022 07:53:29 GMT
- Title: Understanding Text Classification Data and Models Using Aggregated Input
Salience
- Authors: Sebastian Ebert, Alice Shoshana Jakobovits, Katja Filippova
- Abstract summary: In some cases, an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning.
In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models.
Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified.
- Score: 2.105564340986074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realizing when a model is right for a wrong reason is not trivial and
requires a significant effort by model developers. In some cases, an input
salience method, which highlights the most important parts of the input, may
reveal problematic reasoning. But scrutinizing highlights over many data
instances is tedious and often infeasible. Furthermore, analyzing examples in
isolation does not reveal general patterns in the data or in the model's
behavior. In this paper we aim to address these issues and go from
understanding single examples to understanding entire datasets and models. The
methodology we propose is based on aggregated salience maps. Using this
methodology we address multiple distinct but common model developer needs by
showing how problematic data and model behavior can be identified -- a
necessary first step for improving the model.
Related papers
- Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Sharing pattern submodels for prediction with missing values [12.981974894538668]
Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time.
We propose an alternative approach, called sharing pattern submodels, which i) makes predictions robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels andiii) has a short description, enabling improved interpretability.
arXiv Detail & Related papers (2022-06-22T15:09:40Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Self-Attention Between Datapoints: Going Beyond Individual Input-Output
Pairs in Deep Learning [36.047444794544425]
We introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.
Our approach uses self-attention to reason about relationships between datapoints explicitly.
Unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction.
arXiv Detail & Related papers (2021-06-04T16:30:49Z) - Field-wise Learning for Multi-field Categorical Data [27.100048708707593]
We propose a new method for learning with multi-field categorical data.
In doing this, the models can be fitted to each category and thus can better capture the underlying differences in data.
The experiment results on two large-scale datasets show the superior performance of our model.
arXiv Detail & Related papers (2020-12-01T01:10:14Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.