Novel Entity Discovery from Web Tables
- URL: http://arxiv.org/abs/2002.00206v1
- Date: Sat, 1 Feb 2020 13:24:03 GMT
- Title: Novel Entity Discovery from Web Tables
- Authors: Shuo Zhang and Edgar Meij and Krisztian Balog and Ridho Reinanda
- Abstract summary: We leverage tables on the Web to discover new entities, properties, and relationships.
Our method identifies not only out-of-KB (novel'') information but also novel aliases for in-KB (known'') entities.
- Score: 21.16349961050804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When working with any sort of knowledge base (KB) one has to make sure it is
as complete and also as up-to-date as possible. Both tasks are non-trivial as
they require recall-oriented efforts to determine which entities and
relationships are missing from the KB. As such they require a significant
amount of labor. Tables on the Web, on the other hand, are abundant and have
the distinct potential to assist with these tasks. In particular, we can
leverage the content in such tables to discover new entities, properties, and
relationships. Because web tables typically only contain raw textual content we
first need to determine which cells refer to which known entities---a task we
dub table-to-KB matching. This first task aims to infer table semantics by
linking table cells and heading columns to elements of a KB. Then second task
builds upon these linked entities and properties to not only identify novel
ones in the same table but also to bootstrap their type and additional
relationships. We refer to this process as novel entity discovery and, to the
best of our knowledge, it is the first endeavor on mining the unlinked cells in
web tables. Our method identifies not only out-of-KB (``novel'') information
but also novel aliases for in-KB (``known'') entities. When evaluated using
three purpose-built test collections, we find that our proposed approaches
obtain a marked improvement in terms of precision over our baselines whilst
keeping recall stable.
Related papers
- Relational Multi-Task Learning: Modeling Relations between Data and
Tasks [84.41620970886483]
Key assumption in multi-task learning is that at the inference time the model only has access to a given data point but not to the data point's labels from other tasks.
Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions.
We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks.
arXiv Detail & Related papers (2023-03-14T07:15:41Z) - Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity
Linking [23.01938139604297]
We propose a new BERT-based Entity Linking (EL) method which can identify mentions that do not have corresponding KB entities by matching them to a NIL entity.
Results on five datasets show the advantages of BLINKout over existing methods to identify out-of-KB mentions.
arXiv Detail & Related papers (2023-02-14T17:00:06Z) - Visual Named Entity Linking: A New Dataset and A Baseline [61.38231023490981]
We consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image.
We propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL)
We present a high-quality human-annotated visual person linking dataset, named WIKIPerson.
arXiv Detail & Related papers (2022-11-09T13:27:50Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Tab2Know: Building a Knowledge Base from Tables in Scientific Papers [6.514665180383298]
We present Tab2Know, a new end-to-end system to build a Knowledge Base from tables in scientific papers.
We propose a pipeline that employs both statistical-based classifiers and logic-based reasoning.
An empirical evaluation of our approach using a corpus of papers in the Computer Science domain has returned satisfactory performance.
arXiv Detail & Related papers (2021-07-28T11:56:53Z) - BERT Meets Relational DB: Contextual Representations of Relational
Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.
We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - TURL: Table Understanding through Representation Learning [29.6016859927782]
TURL is a novel framework that introduces the pre-training/finetuning paradigm to relational Web tables.
During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner.
We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.
arXiv Detail & Related papers (2020-06-26T05:44:54Z) - Exploring the Combination of Contextual Word Embeddings and Knowledge
Graph Embeddings [0.0]
Embeddings of knowledge bases (KB) capture the explicit relations between entities denoted by words, but are not able to directly capture the syntagmatic properties of these words.
We propose a new approach using contextual and KB embeddings jointly at the same level.
arXiv Detail & Related papers (2020-04-17T17:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.