Explaining the Performance of Multi-label Classification Methods with
Data Set Properties
- URL: http://arxiv.org/abs/2106.15411v1
- Date: Mon, 28 Jun 2021 11:00:05 GMT
- Title: Explaining the Performance of Multi-label Classification Methods with
Data Set Properties
- Authors: Jasmin Bogatinovski, Ljup\v{c}o Todorovski, Sa\v{s}o D\v{z}eroski,
Dragi Kocev
- Abstract summary: We present a comprehensive meta-learning study of data sets and methods for multi-label classification (MLC)
Here, we analyze 40 MLC data sets by using 50 meta features describing different properties of the data.
The most prominent meta features that describe the space of MLC data sets are the ones assessing different aspects of the label space.
- Score: 1.1278903078792917
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Meta learning generalizes the empirical experience with different learning
tasks and holds promise for providing important empirical insight into the
behaviour of machine learning algorithms. In this paper, we present a
comprehensive meta-learning study of data sets and methods for multi-label
classification (MLC). MLC is a practically relevant machine learning task where
each example is labelled with multiple labels simultaneously. Here, we analyze
40 MLC data sets by using 50 meta features describing different properties of
the data. The main findings of this study are as follows. First, the most
prominent meta features that describe the space of MLC data sets are the ones
assessing different aspects of the label space. Second, the meta models show
that the most important meta features describe the label space, and, the meta
features describing the relationships among the labels tend to occur a bit more
often than the meta features describing the distributions between and within
the individual labels. Third, the optimization of the hyperparameters can
improve the predictive performance, however, quite often the extent of the
improvements does not always justify the resource utilization.
Related papers
- Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning [0.0]
We introduce logit separability, a criterion to assess the clarity of both samples and class-related words at the logit level.
We find that incorporating multiple class-related words for each sample, rather than relying on a single class name, improves performance by offering a broader range of label information.
We propose LICL, a logit separability-based method that jointly organizes samples and integrates multiple class-related words into each sample-label pair.
arXiv Detail & Related papers (2024-06-16T12:11:46Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular
Mutual Information [3.1845305066053347]
Few-shot classification (FSC) requires training models using a few (typically one to five) data points per class.
We propose PLATINUM, a novel semi-supervised model agnostic meta-learning framework that uses the submodular mutual information (SMI) functions to boost the performance of FSC.
arXiv Detail & Related papers (2022-01-30T22:07:17Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Multi-label Few/Zero-shot Learning with Knowledge Aggregated from
Multiple Label Graphs [8.44680447457879]
We present a simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships.
We show that methods equipped with the multi-graph knowledge aggregation achieve significant performance improvement across almost all the measures on few/zero-shot labels.
arXiv Detail & Related papers (2020-10-15T01:15:43Z) - Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels.
Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z) - Minimally Supervised Categorization of Text with Metadata [40.13841133991089]
We propose MetaCat, a minimally supervised framework to categorize text with metadata.
We develop a generative process describing the relationships between words, documents, labels, and metadata.
Based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.
arXiv Detail & Related papers (2020-05-01T21:42:32Z) - Incremental Meta-Learning via Indirect Discriminant Alignment [118.61152684795178]
We develop a notion of incremental learning during the meta-training phase of meta-learning.
Our approach performs favorably at test time as compared to training a model with the full meta-training set.
arXiv Detail & Related papers (2020-02-11T01:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.