Related papers: Korean-Specific Dataset for Table Question Answering

Korean-Specific Dataset for Table Question Answering

URL: http://arxiv.org/abs/2201.06223v1
Date: Mon, 17 Jan 2022 05:47:44 GMT
Title: Korean-Specific Dataset for Table Question Answering
Authors: Changwook Jun, Jooyoung Choi, Myoseop Sim, Hyun Kim, Hansol Jang, Kyungkoo Min
Abstract summary: We build Korean-specific datasets for table question answering written in English. Korean table question answering corpus consists of 70k pairs of questions and answers created by crowd-sourced workers. We make our datasets publicly available via our GitHub repository.
Score: 3.7056358801102682
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Existing question answering systems mainly focus on dealing with text data. However, much of the data produced daily is stored in the form of tables that can be found in documents and relational databases, or on the web. To solve the task of question answering over tables, there exist many datasets for table question answering written in English, but few Korean datasets. In this paper, we demonstrate how we construct Korean-specific datasets for table question answering: Korean tabular dataset is a collection of 1.4M tables with corresponding descriptions for unsupervised pre-training language models. Korean table question answering corpus consists of 70k pairs of questions and answers created by crowd-sourced workers. Subsequently, we then build a pre-trained language model based on Transformer, and fine-tune the model for table question answering with these datasets. We then report the evaluation results of our model. We make our datasets publicly available via our GitHub repository, and hope that those datasets will help further studies for question answering over tables, and for transformation of table formats.

Related papers

Table Question Answering for Low-resourced Indic Languages [71.57359949962678]
TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output. We introduce a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget. We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models.
arXiv Detail & Related papers (2024-10-04T16:26:12Z)
TANQ: An open domain dataset of table answered questions [15.323690523538572]
TANQ is the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging behind human performance by 19.7 points.
arXiv Detail & Related papers (2024-05-13T14:07:20Z)
WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction [56.196512595940334]
This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks. We leverage 26,531 tables from the Wiki dataset to generate natural language instructions for six distinct basic operations. We evaluate several representative large language models on the WikiTableEdit dataset to demonstrate the challenge of this task.
arXiv Detail & Related papers (2024-03-05T13:33:12Z)
Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data. Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table. We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
TableQuery: Querying tabular data with natural language [0.0]
In TableQuery, we use deep learning models pre-trained for question answering on free text to convert natural language queries to structured queries. Deep learning models pre-trained for question answering on free text are readily available on platforms such as HuggingFace Model Hub. TableQuery does not require re-training; when a newly trained model for question answering with better performance is available, it can replace the existing model in TableQuery.
arXiv Detail & Related papers (2022-01-27T17:26:25Z)
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering. This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase. There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z)
Summarizing and Exploring Tabular Data in Conversational Search [36.14882974814593]
We build a new conversation-oriented, open-domain table summarization dataset. It includes annotated table summaries, which not only answer questions but also help people explore other information in the table. We utilize this dataset to develop automatic table summarization systems as SOTA baselines.
arXiv Detail & Related papers (2020-05-23T08:29:51Z)
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.