Tabular Learning: Encoding for Entity and Context Embeddings
- URL: http://arxiv.org/abs/2403.19405v1
- Date: Thu, 28 Mar 2024 13:29:29 GMT
- Title: Tabular Learning: Encoding for Entity and Context Embeddings
- Authors: Fredy Reusser,
- Abstract summary: Examining the effect of different encoding techniques on entity and context embeddings.
Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity encoding with regard to multi-label classification tasks.
Related papers
- On the Suitability of Representations for Quality Diversity Optimization
of Shapes [77.34726150561087]
The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance.
This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes.
arXiv Detail & Related papers (2023-04-07T07:34:23Z) - Effective and Interpretable Information Aggregation with Capacity
Networks [3.4012007729454807]
Capacity networks generate multiple interpretable intermediate results which can be aggregated in a semantically meaningful space.
Our experiments show that implementing this simple inductive bias leads to improvements over different encoder-decoder architectures.
arXiv Detail & Related papers (2022-07-25T09:45:16Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Graph-Based Decoding for Task Oriented Semantic Parsing [16.054030490095464]
We formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing.
We find that our graph-based approach is competitive with sequence decoders on the standard setting, and offers significant improvements in data efficiency and settings where partially-annotated data is available.
arXiv Detail & Related papers (2021-09-09T23:22:09Z) - Regularized target encoding outperforms traditional methods in
supervised machine learning with high cardinality features [1.1709030738577393]
We study techniques that yield numeric representations of categorical variables.
We compare different encoding strategies together with five machine learning algorithms.
Regularized versions of target encoding consistently provided the best results.
arXiv Detail & Related papers (2021-04-01T17:21:42Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.