Summarizing and Exploring Tabular Data in Conversational Search
- URL: http://arxiv.org/abs/2005.11490v3
- Date: Fri, 10 Jul 2020 15:56:31 GMT
- Title: Summarizing and Exploring Tabular Data in Conversational Search
- Authors: Shuo Zhang and Zhuyun Dai and Krisztian Balog and Jamie Callan
- Abstract summary: We build a new conversation-oriented, open-domain table summarization dataset.
It includes annotated table summaries, which not only answer questions but also help people explore other information in the table.
We utilize this dataset to develop automatic table summarization systems as SOTA baselines.
- Score: 36.14882974814593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular data provide answers to a significant portion of search queries.
However, reciting an entire result table is impractical in conversational
search systems. We propose to generate natural language summaries as answers to
describe the complex information contained in a table. Through crowdsourcing
experiments, we build a new conversation-oriented, open-domain table
summarization dataset. It includes annotated table summaries, which not only
answer questions but also help people explore other information in the table.
We utilize this dataset to develop automatic table summarization systems as
SOTA baselines. Based on the experimental results, we identify challenges and
point out future research directions that this resource will support.
Related papers
- TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
TACT contains challenging instructions that demand stitching information scattered across one or more texts.
We construct this dataset by leveraging an existing dataset of texts and their associated tables.
We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - TANQ: An open domain dataset of table answered questions [15.323690523538572]
TANQ is the first open domain question answering dataset where the answers require building tables from information across multiple sources.
We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups.
Our best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging behind human performance by 19.7 points.
arXiv Detail & Related papers (2024-05-13T14:07:20Z) - QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries.
We propose a novel method to address these limitations by introducing query-focused multi-table summarization.
Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z) - Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z) - Beyond Extraction: Contextualising Tabular Data for Efficient
Summarisation by Language Models [0.0]
The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents.
This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
arXiv Detail & Related papers (2024-01-04T16:16:14Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Representations for Question Answering from Documents with Tables and
Text [22.522986299412807]
We aim to improve question answering from tables by refining table representations based on information from surrounding text.
We also present an effective method to combine text and table-based predictions for question answering from full documents.
arXiv Detail & Related papers (2021-01-26T05:52:20Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.