Feature Ranking for Semi-supervised Learning
- URL: http://arxiv.org/abs/2008.03937v1
- Date: Mon, 10 Aug 2020 07:50:50 GMT
- Title: Feature Ranking for Semi-supervised Learning
- Authors: Matej Petkovi\'c, Sa\v{s}o D\v{z}eroski, Dragi Kocev
- Abstract summary: We propose semi-supervised learning of feature ranking.
To the best of our knowledge, this is the first work that treats the task of feature ranking within the semi-supervised structured output prediction context.
The evaluation across 38 benchmark datasets reveals the following: Random Forests perform the best for the classification-like tasks, while for the regression-like tasks Extra-PCTs perform the best.
- Score: 3.1380888953704984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The data made available for analysis are becoming more and more complex along
several directions: high dimensionality, number of examples and the amount of
labels per example. This poses a variety of challenges for the existing machine
learning methods: coping with dataset with a large number of examples that are
described in a high-dimensional space and not all examples have labels
provided. For example, when investigating the toxicity of chemical compounds
there are a lot of compounds available, that can be described with information
rich high-dimensional representations, but not all of the compounds have
information on their toxicity. To address these challenges, we propose
semi-supervised learning of feature ranking. The feature rankings are learned
in the context of classification and regression as well as in the context of
structured output prediction (multi-label classification, hierarchical
multi-label classification and multi-target regression). To the best of our
knowledge, this is the first work that treats the task of feature ranking
within the semi-supervised structured output prediction context. More
specifically, we propose two approaches that are based on tree ensembles and
the Relief family of algorithms. The extensive evaluation across 38 benchmark
datasets reveals the following: Random Forests perform the best for the
classification-like tasks, while for the regression-like tasks Extra-PCTs
perform the best, Random Forests are the most efficient method considering
induction times across all tasks, and semi-supervised feature rankings
outperform their supervised counterpart across a majority of the datasets from
the different tasks.
Related papers
- Generating Hierarchical Structures for Improved Time Series
Classification Using Stochastic Splitting Functions [0.0]
This study introduces a novel hierarchical divisive clustering approach with splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC)
The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy.
arXiv Detail & Related papers (2023-09-21T10:34:50Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification [2.706328351174805]
We propose a hierarchical multi-label classification method based on semi-supervised learning of predictive clustering trees.
We also extend the method towards ensemble learning and propose a method based on the random forest approach.
arXiv Detail & Related papers (2022-07-19T12:49:00Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Deep tree-ensembles for multi-output prediction [0.0]
We propose a novel deep tree-ensemble (DTE) model, where every layer enriches the original feature set with a representation learning component based on tree-embeddings.
We specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression.
arXiv Detail & Related papers (2020-11-03T16:25:54Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z) - Efficient strategies for hierarchical text classification: External
knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy.
With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.