Propositionalization and Embeddings: Two Sides of the Same Coin
- URL: http://arxiv.org/abs/2006.04410v1
- Date: Mon, 8 Jun 2020 08:33:21 GMT
- Title: Propositionalization and Embeddings: Two Sides of the Same Coin
- Authors: Nada Lavra\v{c} and Bla\v{z} \v{S}krlj and Marko Robnik-\v{S}ikonja
- Abstract summary: This paper outlines some of the modern data processing techniques used in relational learning.
It focuses on the propositionalization and embedding data transformation approaches.
We present two efficient implementations of the unifying methodology.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data preprocessing is an important component of machine learning pipelines,
which requires ample time and resources. An integral part of preprocessing is
data transformation into the format required by a given learning algorithm.
This paper outlines some of the modern data processing techniques used in
relational learning that enable data fusion from different input data types and
formats into a single table data representation, focusing on the
propositionalization and embedding data transformation approaches. While both
approaches aim at transforming data into tabular data format, they use
different terminology and task definitions, are perceived to address different
goals, and are used in different contexts. This paper contributes a unifying
framework that allows for improved understanding of these two data
transformation techniques by presenting their unified definitions, and by
explaining the similarities and differences between the two approaches as
variants of a unified complex data transformation task. In addition to the
unifying framework, the novelty of this paper is a unifying methodology
combining propositionalization and embeddings, which benefits from the
advantages of both in solving complex data transformation and learning tasks.
We present two efficient implementations of the unifying methodology: an
instance-based PropDRM approach, and a feature-based PropStar approach to data
transformation and learning, together with their empirical evaluation on
several relational problems. The results show that the new algorithms can
outperform existing relational learners and can solve much larger problems.
Related papers
- What is different between these datasets? [23.271594219577185]
Two comparable datasets in the same domain may have different distributions.
We propose a suite of interpretable methods (toolbox) for comparing two datasets.
Our methods not only outperform comparable and related approaches in terms of explanation quality and correctness, but also provide actionable, complementary insights to understand and mitigate dataset differences effectively.
arXiv Detail & Related papers (2024-03-08T19:52:39Z) - Few-Shot Data-to-Text Generation via Unified Representation and
Multi-Source Learning [114.54944761345594]
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods.
Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2023-08-10T03:09:12Z) - Style Transfer as Data Augmentation: A Case Study on Named Entity
Recognition [17.892385961143173]
We propose a new method to transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes.
We design a constrained decoding algorithm along with a set of key ingredients for data selection to guarantee the generation of valid and coherent data.
Our approach is a practical solution to data scarcity, and we expect it to be applicable to other NLP tasks.
arXiv Detail & Related papers (2022-10-14T16:02:03Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - GEDI: A Graph-based End-to-end Data Imputation Framework [3.5478302034537705]
The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations.
It uses a meta-learning framework to select features that are influential to the downstream prediction task of interest.
We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance.
arXiv Detail & Related papers (2022-08-13T05:16:40Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Towards a Flexible Embedding Learning Framework [15.604564543883122]
We propose an embedding learning framework that is flexible in terms of the relationships that can be embedded into the learned representations.
A sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings.
Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks.
arXiv Detail & Related papers (2020-09-23T08:00:56Z) - On Compositions of Transformations in Contrastive Self-Supervised
Learning [66.15514035861048]
In this paper, we generalize contrastive learning to a wider set of transformations.
We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations.
arXiv Detail & Related papers (2020-03-09T17:56:49Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z) - Multi-Objective Genetic Programming for Manifold Learning: Balancing
Quality and Dimensionality [4.4181317696554325]
State-of-the-art manifold learning algorithms are opaque in how they perform this transformation.
We introduce a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality.
Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods.
arXiv Detail & Related papers (2020-01-05T23:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.