Related papers: Mining Feature Relationships in Data

Mining Feature Relationships in Data

URL: http://arxiv.org/abs/2102.01355v1
Date: Tue, 2 Feb 2021 07:06:16 GMT
Title: Mining Feature Relationships in Data
Authors: Andrew Lensen
Abstract summary: Feature relationship mining (FRM) uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data. Our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features. Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: When faced with a new dataset, most practitioners begin by performing exploratory data analysis to discover interesting patterns and characteristics within data. Techniques such as association rule mining are commonly applied to uncover relationships between features (attributes) of the data. However, association rules are primarily designed for use on binary or categorical data, due to their use of rule-based machine learning. A large proportion of real-world data is continuous in nature, and discretisation of such data leads to inaccurate and less informative association rules. In this paper, we propose an alternative approach called feature relationship mining (FRM), which uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data. To the best of our knowledge, our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features. Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships which can be easily interpreted and which provide clear and non-trivial insight into data.

Related papers

Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation [49.898152180805454]
This paper proposes three evaluation metrics designed to assess the preservation of logical relationships. We validate these metrics by assessing the performance of both classical and state-of-the-art generation methods on a real-world industrial dataset.
arXiv Detail & Related papers (2025-02-06T13:13:26Z)
Explaining Categorical Feature Interactions Using Graph Covariance and LLMs [18.44675735926458]
This paper focuses on the global synthetic dataset from the Counter Trafficking Data Collaborative. It contains over 200,000 anonymized records spanning from 2002 to 2022 with numerous categorical features for each record. We propose a fast and scalable method for analyzing and extracting significant categorical feature interactions.
arXiv Detail & Related papers (2025-01-24T21:41:26Z)
Benchmarking the Fidelity and Utility of Synthetic Relational Data [1.024113475677323]
We review related work on relational data synthesis, common benchmarking datasets, and approaches to measuring the fidelity and utility of synthetic data. We combine the best practices and a novel robust detection approach into a benchmarking tool and use it to compare six methods. For utility, we typically observe moderate correlation between real and synthetic data for both model predictive performance and feature importance.
arXiv Detail & Related papers (2024-10-04T13:23:45Z)
Semantic-Enhanced Relational Metric Learning for Recommender Systems [27.330164862413184]
Recently, metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. We propose a joint Semantic-Enhanced Metric Learning framework to tackle the problem in recommender systems. Specifically the semantic signal is first extracted from the target reviews containing abundant features and personalized user preferences. A novel regression model is then designed via leveraging the extracted semantic signal to improve the discriminative ability of original relation-based training process.
arXiv Detail & Related papers (2024-06-07T11:54:50Z)
Jointprop: Joint Semi-supervised Learning for Entity and Relation Extraction with Heterogeneous Graph-based Propagation [13.418617500641401]
We propose Jointprop, a Heterogeneous Graph-based Propagation framework for joint semi-supervised entity and relation extraction. We construct a unified span-based heterogeneous graph from entity and relation candidates and propagate class labels based on confidence scores. We show that our framework outperforms the state-of-the-art semi-supervised approaches on NER and RE tasks.
arXiv Detail & Related papers (2023-05-25T09:07:04Z)
GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources [21.32471030724983]
Individual-level data (microdata) that characterizes a population is essential for studying many real-world problems. In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data.
arXiv Detail & Related papers (2022-12-08T01:22:12Z)
Can I see an Example? Active Learning the Long Tail of Attributes and Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes. While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories. Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z)
Realistic Counterfactual Explanations by Learned Relations [0.0]
We propose a novel approach to realistic counterfactual explanations that preserve relationships between data attributes. The model directly learns the relationships by a variational auto-encoder without domain knowledge and then learns to disturb the latent space accordingly.
arXiv Detail & Related papers (2022-02-15T12:33:51Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts. We learn relation prototypes as an implicit factor between entities. We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z)
Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities. We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities. Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
Type-augmented Relation Prediction in Knowledge Graphs [65.88395564516115]
We propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction. Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on four benchmark datasets.
arXiv Detail & Related papers (2020-09-16T21:14:18Z)
Discovering Nonlinear Relations with Minimum Predictive Information Regularization [67.7764810514585]
We introduce a novel minimum predictive information regularization method to infer directional relations from time series. Our method substantially outperforms other methods for learning nonlinear relations in synthetic datasets.
arXiv Detail & Related papers (2020-01-07T04:28:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.