Metric Learning Improves the Ability of Combinatorial Coverage Metrics
  to Anticipate Classification Error
        - URL: http://arxiv.org/abs/2302.14616v1
- Date: Tue, 28 Feb 2023 14:55:57 GMT
- Title: Metric Learning Improves the Ability of Combinatorial Coverage Metrics
  to Anticipate Classification Error
- Authors: Tyler Cody, Laura Freeman
- Abstract summary: Many machine learning methods are sensitive to test or operational data that is dissimilar to training data.
 metric learning is a technique for learning latent spaces where data from different classes is further apart.
In a study of 6 open-source datasets, we find that metric learning increased the difference between set-difference coverage metrics calculated on correctly and incorrectly classified data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Machine learning models are increasingly used in practice. However, many
machine learning methods are sensitive to test or operational data that is
dissimilar to training data. Out-of-distribution (OOD) data is known to
increase the probability of error and research into metrics that identify what
dissimilarities in data affect model performance is on-going. Recently,
combinatorial coverage metrics have been explored in the literature as an
alternative to distribution-based metrics. Results show that coverage metrics
can correlate with classification error. However, other results show that the
utility of coverage metrics is highly dataset-dependent. In this paper, we show
that this dataset-dependence can be alleviated with metric learning, a machine
learning technique for learning latent spaces where data from different classes
is further apart. In a study of 6 open-source datasets, we find that metric
learning increased the difference between set-difference coverage metrics
(SDCCMs) calculated on correctly and incorrectly classified data, thereby
demonstrating that metric learning improves the ability of SDCCMs to anticipate
classification error. Paired t-tests validate the statistical significance of
our findings. Overall, we conclude that metric learning improves the ability of
coverage metrics to anticipate classifier error and identify when OOD data is
likely to degrade model performance.
 
      
        Related papers
        - Global Ground Metric Learning with Applications to scRNA data [5.70896453969985]
 We propose a novel approach for learning metrics for arbitrary distributions over a shared metric space.<n>Our method provides a distance between individual points like a global metric, but requires only class labels on a distribution-level for training.<n>We demonstrate the effectiveness and interpretability of our approach using patient-level scRNA-seq data spanning multiple diseases.
 arXiv  Detail & Related papers  (2025-06-18T11:53:13Z)
- Learning Unified Distance Metric Across Diverse Data Distributions with   Parameter-Efficient Transfer Learning [36.349282242221065]
 A common practice in metric learning is to train and test an embedding model for each dataset.
This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data.
We explore a new metric learning paradigm, called Unified Metric Learning (UML), which learns a unified distance metric.
 arXiv  Detail & Related papers  (2023-09-16T10:34:01Z)
- Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
 This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
 arXiv  Detail & Related papers  (2023-06-10T03:29:48Z)
- Rapid Adaptation in Online Continual Learning: Are We Evaluating It
  Right? [135.71855998537347]
 We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy.
We show that this metric is unreliable, as even vacuous blind classifiers can achieve unrealistically high online accuracy.
Existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information.
 arXiv  Detail & Related papers  (2023-05-16T08:29:33Z)
- A classification performance evaluation measure considering data
  separability [6.751026374812737]
 We propose a new separability measure--the rate of separability (RS)--based on the data coding rate.
We demonstrate the positive correlation between the proposed measure and recognition accuracy in a multi-task scenario constructed from a real dataset.
 arXiv  Detail & Related papers  (2022-11-10T09:18:26Z)
- Understanding Factual Errors in Summarization: Errors, Summarizers,
  Datasets, Error Detectors [105.12462629663757]
 In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
 arXiv  Detail & Related papers  (2022-05-25T15:26:48Z)
- Investigating Data Variance in Evaluations of Automatic Machine
  Translation Metrics [58.50754318846996]
 In this paper, we show that the performances of metrics are sensitive to data.
The ranking of metrics varies when the evaluation is conducted on different datasets.
 arXiv  Detail & Related papers  (2022-03-29T18:58:28Z)
- Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
 This paper explores how changes in a data set influence the measured performance of a model.
Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance.
The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
 arXiv  Detail & Related papers  (2022-01-17T23:05:14Z)
- Finding Significant Features for Few-Shot Learning using Dimensionality
  Reduction [0.0]
 This module helps to improve the accuracy performance by allowing the similarity function, given by the metric learning method, to have more discriminative features for the classification.
Our method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance.
 arXiv  Detail & Related papers  (2021-07-06T16:36:57Z)
- ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
  Semi-supervised Continual Learning [52.831894583501395]
 Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
 arXiv  Detail & Related papers  (2021-01-02T09:04:14Z)
- Provably Robust Metric Learning [98.50580215125142]
 We show that existing metric learning algorithms can result in metrics that are less robust than the Euclidean distance.
We propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations.
 Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors.
 arXiv  Detail & Related papers  (2020-06-12T09:17:08Z)
- Data Separability for Neural Network Classifiers and the Development of
  a Separability Index [17.49709034278995]
 We created the Distance-based Separability Index (DSI) to measure the separability of datasets.
We show that the DSI can indicate whether data belonging to different classes have similar distributions.
We also discussed possible applications of the DSI in the fields of data science, machine learning, and deep learning.
 arXiv  Detail & Related papers  (2020-05-27T01:49:19Z)
- Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
 We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
 arXiv  Detail & Related papers  (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.