Related papers: Semantic Labeling Using a Deep Contextualized Language Model

Semantic Labeling Using a Deep Contextualized Language Model

URL: http://arxiv.org/abs/2010.16037v1
Date: Fri, 30 Oct 2020 03:04:22 GMT
Title: Semantic Labeling Using a Deep Contextualized Language Model
Authors: Mohamed Trabelsi, Jin Cao, Jeff Heflin
Abstract summary: We propose a context-aware semantic labeling method using both the column values and context. Our new method is based on a new setting for semantic labeling, where we sequentially predict labels for an input table with missing headers. To our knowledge, we are the first to successfully apply BERT to solve the semantic labeling task.
Score: 9.719972529205101
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly minimizes human effort. Furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables. Understanding the connection between column values and contextual information is an important yet neglected aspect as previously proposed methods treat each column independently. In this paper, we propose a context-aware semantic labeling method using both the column values and context. Our new method is based on a new setting for semantic labeling, where we sequentially predict labels for an input table with missing headers. We incorporate both the values and context of each data column using the pre-trained contextualized language model, BERT, that has achieved significant improvements in multiple natural language processing tasks. To our knowledge, we are the first to successfully apply BERT to solve the semantic labeling task. We evaluate our approach using two real-world datasets from different domains, and we demonstrate substantial improvements in terms of evaluation metrics over state-of-the-art feature-based methods.

Related papers

Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective [71.45945607871715]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) The core idea is to embed data instances into a shared feature space, where each instance is represented by its distance to a fixed number of nearest neighbors and their labels. Extensive experiments on 101 datasets confirm TabPTM's effectiveness in both classification and regression tasks, with and without fine-tuning.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Matching of Descriptive Labels to Glossary Descriptions [4.030805205247758]
We propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization. We performed an experiment on two datasets derived from publicly available data sources.
arXiv Detail & Related papers (2023-10-27T07:09:04Z)
Substituting Data Annotation with Balanced Updates and Collective Loss in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text. We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
Tab2KG: Semantic Table Interpretation with Lightweight Semantic Profiles [3.655021726150368]
This article proposes Tab2KG - a novel method that targets at the semantic interpretation of tables with previously unseen data. We introduce original semantic profiles that enrich a domain's concepts and relations and represent domain and table characteristics. In contrast to the existing semantic table interpretation approaches, Tab2KG relies on the semantic profiles only and does not require any instance lookup.
arXiv Detail & Related papers (2023-02-02T15:12:30Z)
Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks. We then tailor the labeling model specifically to the task of entity matching. We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z)
UniRE: A Unified Label Space for Entity Relation Extraction [67.53850477281058]
Joint entity relation extraction models setup two separated label spaces for the two sub-tasks. We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces.
arXiv Detail & Related papers (2021-07-09T08:09:37Z)
MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information. We propose different ways to regularize the parameters and output probability of each child label by its parents. Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z)
A Study on the Autoregressive and non-Autoregressive Multi-label Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly. Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z)
DLDL: Dynamic Label Dictionary Learning via Hypergraph Regularization [17.34373273007931]
We propose a Dynamic Label Dictionary Learning (DLDL) algorithm to generate the soft label matrix for unlabeled data. Specifically, we employ hypergraph manifold regularization to keep the relations among original data, transformed data, and soft labels consistent.
arXiv Detail & Related papers (2020-10-23T14:07:07Z)
Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models. By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.