Azimuth: Systematic Error Analysis for Text Classification
- URL: http://arxiv.org/abs/2212.08216v2
- Date: Mon, 19 Dec 2022 04:01:57 GMT
- Title: Azimuth: Systematic Error Analysis for Text Classification
- Authors: Gabrielle Gauthier-Melan\c{c}on, Orlando Marquez Ayala, Lindsay Brin,
Chris Tyler, Fr\'ed\'eric Branchaud-Charron, Joseph Marinier, Karine Grande,
Di Le
- Abstract summary: Azimuth is an open-source tool to perform error analysis for text classification.
We propose an approach comprising dataset analysis and model quality assessment.
- Score: 3.1679600401346706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Azimuth, an open-source and easy-to-use tool to perform error
analysis for text classification. Compared to other stages of the ML
development cycle, such as model training and hyper-parameter tuning, the
process and tooling for the error analysis stage are less mature. However, this
stage is critical for the development of reliable and trustworthy AI systems.
To make error analysis more systematic, we propose an approach comprising
dataset analysis and model quality assessment, which Azimuth facilitates. We
aim to help AI practitioners discover and address areas where the model does
not generalize by leveraging and integrating a range of ML techniques, such as
saliency maps, similarity, uncertainty, and behavioral analyses, all in one
tool. Our code and documentation are available at
github.com/servicenow/azimuth.
Related papers
- Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information [0.0]
This paper analyzes synthetic, false, and genuine information in the form of text from spectral analysis, visualization, and explainability perspectives.
Various embedding techniques on multiple datasets are used to represent information.
Classification is done using multiple machine learning algorithms.
arXiv Detail & Related papers (2024-07-07T18:31:09Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - The Hitchhiker's Guide to Program Analysis: A Journey with Large
Language Models [18.026567399243]
Large Language Models (LLMs) offer a promising alternative to static analysis.
In this paper, we take a deep dive into the open space of LLM-assisted static analysis.
We develop LLift, a fully automated framework that interfaces with both a static analysis tool and an LLM.
arXiv Detail & Related papers (2023-08-01T02:57:43Z) - Metric Tools for Sensitivity Analysis with Applications to Neural
Networks [0.0]
Explainable Artificial Intelligence (XAI) aims to provide interpretations for predictions made by Machine Learning models.
In this paper, a theoretical framework is proposed to study sensitivities of ML models using metric techniques.
A complete family of new quantitative metrics called $alpha$-curves is extracted.
arXiv Detail & Related papers (2023-05-03T18:10:21Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Spatial machine-learning model diagnostics: a model-agnostic
distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools.
The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities.
The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z) - Unrolling SGD: Understanding Factors Influencing Machine Unlearning [17.6607904333012]
Machine unlearning is the process through which a deployed machine learning model forgets about one of its training data points.
We first taxonomize approaches and metrics of approximate unlearning.
We identify verification error, i.e., the L2 difference between the weights of an approximately unlearned and a naively retrained model.
arXiv Detail & Related papers (2021-09-27T23:46:59Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Machine Learning to Tackle the Challenges of Transient and Soft Errors
in Complex Circuits [0.16311150636417257]
Machine learning models are used to predict accurate per-instance Functional De-Rating data for the full list of circuit instances.
The presented methodology is applied on a practical example and various machine learning models are evaluated and compared.
arXiv Detail & Related papers (2020-02-18T18:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.