BioTABQA: Instruction Learning for Biomedical Table Question Answering
- URL: http://arxiv.org/abs/2207.02419v1
- Date: Wed, 6 Jul 2022 03:40:10 GMT
- Title: BioTABQA: Instruction Learning for Biomedical Table Question Answering
- Authors: Man Luo, Sharad Saxena, Swaroop Mishra, Mihir Parmar, Chitta Baral
- Abstract summary: Table Question Answering (TQA) is an important but under-explored task.
None of TQA datasets exist in the biomedical domain where tables are frequently used to present information.
BioTABQA can not only be used to teach a model how to answer questions from tables but also evaluate how a model generalizes to unseen questions.
- Score: 19.66452178704578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Table Question Answering (TQA) is an important but under-explored task. Most
of the existing QA datasets are in unstructured text format and only few of
them use tables as the context. To the best of our knowledge, none of TQA
datasets exist in the biomedical domain where tables are frequently used to
present information. In this paper, we first curate a table question answering
dataset, BioTABQA, using 22 templates and the context from a biomedical
textbook on differential diagnosis. BioTABQA can not only be used to teach a
model how to answer questions from tables but also evaluate how a model
generalizes to unseen questions, an important scenario for biomedical
applications. To achieve the generalization evaluation, we divide the templates
into 17 training and 5 cross-task evaluations. Then, we develop two baselines
using single and multi-tasks learning on BioTABQA. Furthermore, we explore
instructional learning, a recent technique showing impressive generalizing
performance. Experimental results show that our instruction-tuned model
outperforms single and multi-task baselines on an average by ~23% and ~6%
across various evaluation settings, and more importantly, instruction-tuned
model outperforms baselines by ~5% on cross-tasks.
Related papers
- ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth.
Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format.
This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful.
We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z) - STUNT: Few-shot Tabular Learning with Self-generated Tasks from
Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT)
Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label.
We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - In-BoXBART: Get Instructions into Biomedical Multi-Task Learning [18.3293060030174]
Single-task models have proven pivotal in solving specific tasks; however, they have limitations in real-world applications.
We introduce the BoX, a collection of 32 instruction tasks for Biomedical NLP across various categories.
We propose a unified model termed In-BoXBART, that can jointly learn all tasks of the BoX without any task-specific modules.
arXiv Detail & Related papers (2022-04-15T18:06:22Z) - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and
Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA.
We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - Sequence Tagging for Biomedical Extractive Question Answering [12.464143741310137]
We investigate the difference of the question distribution across the general and biomedical domains.
We discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer)
Our approach can learn to decide the number of answers for a question from training data.
arXiv Detail & Related papers (2021-04-15T15:42:34Z) - Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset [29.866478682797513]
We provide an in-depth analysis of emrQA, the first large-scale dataset for question answering (QA) based on clinical notes.
We find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge.
arXiv Detail & Related papers (2020-05-01T19:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.