Mining Feature Relationships in Data
- URL: http://arxiv.org/abs/2102.01355v1
- Date: Tue, 2 Feb 2021 07:06:16 GMT
- Title: Mining Feature Relationships in Data
- Authors: Andrew Lensen
- Abstract summary: Feature relationship mining (FRM) uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data.
Our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features.
Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: When faced with a new dataset, most practitioners begin by performing
exploratory data analysis to discover interesting patterns and characteristics
within data. Techniques such as association rule mining are commonly applied to
uncover relationships between features (attributes) of the data. However,
association rules are primarily designed for use on binary or categorical data,
due to their use of rule-based machine learning. A large proportion of
real-world data is continuous in nature, and discretisation of such data leads
to inaccurate and less informative association rules. In this paper, we propose
an alternative approach called feature relationship mining (FRM), which uses a
genetic programming approach to automatically discover symbolic relationships
between continuous or categorical features in data. To the best of our
knowledge, our proposed approach is the first such symbolic approach with the
goal of explicitly discovering relationships between features. Empirical
testing on a variety of real-world datasets shows the proposed method is able
to find high-quality, simple feature relationships which can be easily
interpreted and which provide clear and non-trivial insight into data.
Related papers
- Benchmarking the Fidelity and Utility of Synthetic Relational Data [1.024113475677323]
We review related work on relational data synthesis, common benchmarking datasets, and approaches to measuring the fidelity and utility of synthetic data.
We combine the best practices and a novel robust detection approach into a benchmarking tool and use it to compare six methods.
For utility, we typically observe moderate correlation between real and synthetic data for both model predictive performance and feature importance.
arXiv Detail & Related papers (2024-10-04T13:23:45Z) - Semantic-Enhanced Relational Metric Learning for Recommender Systems [27.330164862413184]
Recently, metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph.
We propose a joint Semantic-Enhanced Metric Learning framework to tackle the problem in recommender systems.
Specifically the semantic signal is first extracted from the target reviews containing abundant features and personalized user preferences.
A novel regression model is then designed via leveraging the extracted semantic signal to improve the discriminative ability of original relation-based training process.
arXiv Detail & Related papers (2024-06-07T11:54:50Z) - Jointprop: Joint Semi-supervised Learning for Entity and Relation
Extraction with Heterogeneous Graph-based Propagation [13.418617500641401]
We propose Jointprop, a Heterogeneous Graph-based Propagation framework for joint semi-supervised entity and relation extraction.
We construct a unified span-based heterogeneous graph from entity and relation candidates and propagate class labels based on confidence scores.
We show that our framework outperforms the state-of-the-art semi-supervised approaches on NER and RE tasks.
arXiv Detail & Related papers (2023-05-25T09:07:04Z) - GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using
Macro Data Sources [21.32471030724983]
Individual-level data (microdata) that characterizes a population is essential for studying many real-world problems.
In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data.
arXiv Detail & Related papers (2022-12-08T01:22:12Z) - Can I see an Example? Active Learning the Long Tail of Attributes and
Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes.
While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories.
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z) - Realistic Counterfactual Explanations by Learned Relations [0.0]
We propose a novel approach to realistic counterfactual explanations that preserve relationships between data attributes.
The model directly learns the relationships by a variational auto-encoder without domain knowledge and then learns to disturb the latent space accordingly.
arXiv Detail & Related papers (2022-02-15T12:33:51Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - Type-augmented Relation Prediction in Knowledge Graphs [65.88395564516115]
We propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction.
Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on four benchmark datasets.
arXiv Detail & Related papers (2020-09-16T21:14:18Z) - Discovering Nonlinear Relations with Minimum Predictive Information
Regularization [67.7764810514585]
We introduce a novel minimum predictive information regularization method to infer directional relations from time series.
Our method substantially outperforms other methods for learning nonlinear relations in synthetic datasets.
arXiv Detail & Related papers (2020-01-07T04:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.