INFOTABS: Inference on Tables as Semi-structured Data
- URL: http://arxiv.org/abs/2005.06117v1
- Date: Wed, 13 May 2020 02:07:54 GMT
- Title: INFOTABS: Inference on Tables as Semi-structured Data
- Authors: Vivek Gupta, Maitrey Mehta, Pegah Nokhiz and Vivek Srikumar
- Abstract summary: We introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes.
Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning.
Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task.
- Score: 39.84930221015755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we observe that semi-structured tabulated text is ubiquitous;
understanding them requires not only comprehending the meaning of text
fragments, but also implicit relationships between them. We argue that such
data can prove as a testing ground for understanding how we reason about
information. To study this, we introduce a new dataset called INFOTABS,
comprising of human-written textual hypotheses based on premises that are
tables extracted from Wikipedia info-boxes. Our analysis shows that the
semi-structured, multi-domain and heterogeneous nature of the premises admits
complex, multi-faceted reasoning. Experiments reveal that, while human
annotators agree on the relationships between a table-hypothesis pair, several
standard modeling strategies are unsuccessful at the task, suggesting that
reasoning about tables can pose a difficult modeling challenge.
Related papers
- Scaling Laws with Hidden Structure [2.474908349649168]
Recent advances suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality.
In this paper, we present a controlled experimental framework to test whether neural networks can indeed exploit such hidden factorial structures''
We find that they do leverage these latent patterns to learn discrete distributions more efficiently, and derive scaling laws linking model sizes, hidden factorizations, and accuracy.
arXiv Detail & Related papers (2024-11-02T22:32:53Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Large Language Models are Versatile Decomposers: Decompose Evidence and
Questions for Table-based Reasoning [45.013230888670435]
We exploit large language models (LLMs) as decomposers for effective table-based reasoning.
We decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information.
We propose a "parsing-execution-filling" strategy to alleviate the dilemma of the chain of thought.
arXiv Detail & Related papers (2023-01-31T17:51:45Z) - Realistic Data Augmentation Framework for Enhancing Tabular Reasoning [15.339526664699845]
Existing approaches to constructing training data for Natural Language Inference tasks, such as for semi-structured table reasoning, are either via crowdsourcing or fully automatic methods.
This paper develops a realistic semi-automated framework for data augmentation for tabular inference.
arXiv Detail & Related papers (2022-10-23T17:32:19Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - An Interpretability Illusion for BERT [61.2687465308121]
We describe an "interpretability illusion" that arises when analyzing the BERT model.
We trace the source of this illusion to geometric properties of BERT's embedding space.
We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research.
arXiv Detail & Related papers (2021-04-14T22:04:48Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.