AdaTyper: Adaptive Semantic Column Type Detection
- URL: http://arxiv.org/abs/2311.13806v1
- Date: Thu, 23 Nov 2023 04:42:27 GMT
- Title: AdaTyper: Adaptive Semantic Column Type Detection
- Authors: Madelon Hulsebos and Paul Groth and \c{C}a\u{g}atay Demiralp
- Abstract summary: We propose AdaTyper to address one of the most critical deployment challenges: adaptation.
AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time.
We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing.
- Score: 4.062265896931587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the semantics of relational tables is instrumental for
automation in data exploration and preparation systems. A key source for
understanding a table is the semantics of its columns. With the rise of deep
learning, learned table representations are now available, which can be applied
for semantic type detection and achieve good performance on benchmarks.
Nevertheless, we observe a gap between this performance and its applicability
in practice. In this paper, we propose AdaTyper to address one of the most
critical deployment challenges: adaptation. AdaTyper uses weak-supervision to
adapt a hybrid type predictor towards new semantic types and shifted data
distributions at inference time, using minimal human feedback. The hybrid type
predictor of AdaTyper combines rule-based methods and a light machine learning
model for semantic column type detection. We evaluate the adaptation
performance of AdaTyper on real-world database tables hand-annotated with
semantic column types through crowdsourcing and find that the f1-score improves
for new and existing types. AdaTyper approaches an average precision of 0.6
after only seeing 5 examples, significantly outperforming existing adaptation
methods based on human-provided regular expressions or dictionaries.
Related papers
- TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - Graph Neural Network Approach to Semantic Type Detection in Tables [3.929053351442136]
This study addresses the challenge of detecting semantic column types in relational tables.
We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies.
Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection.
arXiv Detail & Related papers (2024-04-30T18:17:44Z) - Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Ultra-fine Entity Typing with Indirect Supervision from Natural Language
Inference [28.78215056129358]
This work presents LITE, a new approach that formulates entity typing as a natural language inference (NLI) problem.
Experiments show that, with limited training data, LITE obtains state-of-the-art performance on the UFET task.
arXiv Detail & Related papers (2022-02-12T23:56:26Z) - A Closer Look at Prototype Classifier for Few-shot Image Classification [28.821731837776593]
We show that a prototype classifier works equally well without fine-tuning and meta-learning.
We derive a novel generalization bound for the prototypical network and show that focusing on the variance of the norm of a feature vector can improve performance.
arXiv Detail & Related papers (2021-10-11T08:28:43Z) - Making Table Understanding Work in Practice [9.352813774921655]
We discuss three challenges of deploying table understanding models and propose a framework to address them.
We present SigmaTyper which encapsulates a hybrid model trained on GitTables and integrates a lightweight human-in-the-loop approach to customize the model.
arXiv Detail & Related papers (2021-09-11T03:38:24Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.