TabGenie: A Toolkit for Table-to-Text Generation
- URL: http://arxiv.org/abs/2302.14169v1
- Date: Mon, 27 Feb 2023 22:05:46 GMT
- Title: TabGenie: A Toolkit for Table-to-Text Generation
- Authors: Zden\v{e}k Kasner, Ekaterina Garanina, Ond\v{r}ej Pl\'atek, Ond\v{r}ej
Du\v{s}ek
- Abstract summary: TabGenie is a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets.
It is equipped with command line processing tools and Python bindings for unified dataset loading and processing.
- Score: 2.580765958706854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heterogenity of data-to-text generation datasets limits the research on
data-to-text generation systems. We present TabGenie - a toolkit which enables
researchers to explore, preprocess, and analyze a variety of data-to-text
generation datasets through the unified framework of table-to-text generation.
In TabGenie, all the inputs are represented as tables with associated metadata.
The tables can be explored through the web interface, which also provides an
interactive mode for debugging table-to-text generation, facilitates
side-by-side comparison of generated system outputs, and allows easy exports
for manual analysis. Furthermore, TabGenie is equipped with command line
processing tools and Python bindings for unified dataset loading and
processing. We release TabGenie as a PyPI package and provide its open-source
code and a live demo at https://github.com/kasnerz/tabgenie.
Related papers
- PixT3: Pixel-based Table-To-Text Generation [66.96636025277536]
We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations.
Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and superior to generators that operate solely on text.
arXiv Detail & Related papers (2023-11-16T11:32:47Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Generate, Transform, Answer: Question Specific Tool Synthesis for
Tabular Data [6.3455238301221675]
Tabular question answering (TQA) presents a challenging setting for neural systems.
TQA process tables directly, resulting in information loss as table size increases.
We propose ToolWriter to generate query specific programs and detect when to apply them to transform tables.
arXiv Detail & Related papers (2023-03-17T17:26:56Z) - KnowGL: Knowledge Generation and Linking from Text [13.407149206621828]
We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions.
We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART.
To showcase the capabilities of our tool, we build a web application consisting of a set of UI widgets that help users to navigate through the semantic data extracted from a given input text.
arXiv Detail & Related papers (2022-10-25T12:12:36Z) - Rows from Many Sources: Enriching row completions from Wikidata with a
pre-trained Language Model [9.084045516880444]
We present state-of-the-art results for subject suggestion and gap filling measured on a standard benchmark (WikiTables)
We interpret the table using the knowledge base to suggest new rows and generate metadata like headers through property linking.
We synthesize additional rows using free text generation via GPT-3, and crucially, we exploit the metadata we interpret to produce better prompts for text generation.
arXiv Detail & Related papers (2022-04-14T15:11:52Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Towards Faithful Neural Table-to-Text Generation with Content-Matching
Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal.
Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss.
To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.