Around the GLOBE: Numerical Aggregation Question-Answering on
Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks
- URL: http://arxiv.org/abs/2307.16208v1
- Date: Sun, 30 Jul 2023 12:09:00 GMT
- Title: Around the GLOBE: Numerical Aggregation Question-Answering on
Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks
- Authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
- Abstract summary: We present a new end-to-end methodology for numerical aggregation QA for genealogical trees.
The proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task.
This study may have practical implications for genealogical information centers and museums.
- Score: 0.934612743192798
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the key AI tools for textual corpora exploration is natural language
question-answering (QA). Unlike keyword-based search engines, QA algorithms
receive and process natural language questions and produce precise answers to
these questions, rather than long lists of documents that need to be manually
scanned by the users. State-of-the-art QA algorithms based on DNNs were
successfully employed in various domains. However, QA in the genealogical
domain is still underexplored, while researchers in this field (and other
fields in humanities and social sciences) can highly benefit from the ability
to ask questions in natural language, receive concrete answers and gain
insights hidden within large corpora. While some research has been recently
conducted for factual QA in the genealogical domain, to the best of our
knowledge, there is no previous research on the more challenging task of
numerical aggregation QA (i.e., answering questions combining aggregation
functions, e.g., count, average, max). Numerical aggregation QA is critical for
distant reading and analysis for researchers (and the general public)
interested in investigating cultural heritage domains. Therefore, in this
study, we present a new end-to-end methodology for numerical aggregation QA for
genealogical trees that includes: 1) an automatic method for training dataset
generation; 2) a transformer-based table selection method, and 3) an optimized
transformer-based numerical aggregation QA model. The findings indicate that
the proposed architecture, GLOBE, outperforms the state-of-the-art models and
pipelines by achieving 87% accuracy for this task compared to only 21% by
current state-of-the-art models. This study may have practical implications for
genealogical information centers and museums, making genealogical data research
easy and scalable for experts as well as the general public.
Related papers
- SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.
We employ automatic and manual curation to create the dataset.
SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Conversational Question Answering: A Survey [18.447856993867788]
This survey is an effort to present a comprehensive review of the state-of-the-art research trends of Conversational Question Answering (CQA)
Our findings show that there has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives.
arXiv Detail & Related papers (2021-06-02T01:06:34Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z) - EQG-RACE: Examination-Type Question Generation [21.17100754955864]
We propose an innovative Examination-type Question Generation approach (EQG-RACE) to generate exam-like questions based on a dataset extracted from RACE.
Two main strategies are employed in EQG-RACE for dealing with discrete answer information and reasoning among long contexts.
Experimental results show a state-of-the-art performance of EQG-RACE, which is apparently superior to the baselines.
arXiv Detail & Related papers (2020-12-11T03:52:17Z) - A Survey on Complex Question Answering over Knowledge Base: Recent
Advances and Challenges [71.4531144086568]
Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions.
Researchers have shifted their attention from simple questions to complex questions, which require more KB triples and constraint inference.
arXiv Detail & Related papers (2020-07-26T07:13:32Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.