Related papers: TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations

TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations

URL: http://arxiv.org/abs/2411.17110v1
Date: Tue, 26 Nov 2024 05:00:23 GMT
Title: TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations
Authors: Arash Dargahi Nobari, Davood Rafiei,
Abstract summary: In this paper, we introduce TabulaX, a novel framework that leverages Large Language Models (LLMs) for multi-class transformations. We show that TabulaX outperforms existing state-of-the-art approaches in terms of accuracy, supports a broader class of transformations, and generates interpretable transformations that can be efficiently applied.
Score: 8.072353085704627
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of tabular data from diverse sources is often hindered by inconsistencies in formatting and representation, posing significant challenges for data analysts and personal digital assistants. Existing methods for automating tabular data transformations are limited in scope, often focusing on specific types of transformations or lacking interpretability. In this paper, we introduce TabulaX, a novel framework that leverages Large Language Models (LLMs) for multi-class tabular transformations. TabulaX first classifies input tables into four transformation classes (string-based, numerical, algorithmic, and general) and then applies tailored methods to generate human-interpretable transformation functions, such as numeric formulas or programming code. This approach enhances transparency and allows users to understand and modify the mappings. Through extensive experiments on real-world datasets from various domains, we demonstrate that TabulaX outperforms existing state-of-the-art approaches in terms of accuracy, supports a broader class of transformations, and generates interpretable transformations that can be efficiently applied.

Related papers

Scalable Representation Learning for Multimodal Tabular Transactions [14.18267117657451]
We present an innovative and scalable solution to these challenges. We propose a parameter efficient decoder that interleaves transaction and text modalities. We validate the efficacy of our solution on a large-scale dataset of synthetic payments transactions.
arXiv Detail & Related papers (2024-10-10T12:18:42Z)
Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. We present TP-BERTa, a specifically pre-trained LM for tabular data prediction. A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z)
A Practical Method for Generating String Counterfactuals [106.98481791980367]
Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. We give a method to convert representation counterfactuals into string counterfactuals. The resulting counterfactuals can be used to mitigate bias in classification through data augmentation.
arXiv Detail & Related papers (2024-02-17T18:12:02Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
Polynomial-based Self-Attention for Table Representation learning [23.651207486167518]
Self-attention, a key component of Transformers, can lead to an oversmoothing issue. We propose a novel matrix-based self-attention layer as a substitute for the original self-attention layer. In our experiments with three representative table learning models equipped with our proposed layer, we illustrate that the layer effectively mitigates the oversmoothing problem.
arXiv Detail & Related papers (2023-12-12T21:49:26Z)
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
XTab: Cross-table Pretraining for Tabular Transformers [29.419276738753968]
XTab is a framework for cross-table pretraining of tabular transformers on datasets from various domains. We show that XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers. We achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
arXiv Detail & Related papers (2023-05-10T12:17:52Z)
Numeric Encoding Options with Automunge [0.0]
This paper will offer arguments for potential benefits of extended encodings of numeric streams in deep learning. Proposals are based on options for numeric transformations available in the Automunge open source python library platform.
arXiv Detail & Related papers (2022-02-19T02:21:03Z)
Efficient Transformers: A Survey [98.23264445730645]
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models.
arXiv Detail & Related papers (2020-09-14T20:38:14Z)
Propositionalization and Embeddings: Two Sides of the Same Coin [0.0]
This paper outlines some of the modern data processing techniques used in relational learning. It focuses on the propositionalization and embedding data transformation approaches. We present two efficient implementations of the unifying methodology.
arXiv Detail & Related papers (2020-06-08T08:33:21Z)
On Compositions of Transformations in Contrastive Self-Supervised Learning [66.15514035861048]
In this paper, we generalize contrastive learning to a wider set of transformations. We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations.
arXiv Detail & Related papers (2020-03-09T17:56:49Z)
FLAT: Few-Shot Learning via Autoencoding Transformation Regularizers [67.46036826589467]
We present a novel regularization mechanism by learning the change of feature representations induced by a distribution of transformations without using the labels of data examples. It could minimize the risk of overfitting into base categories by inspecting the transformation-augmented variations at the encoded feature level. Experiment results show the superior performances to the current state-of-the-art methods in literature.
arXiv Detail & Related papers (2019-12-29T15:26:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.