Quda: Natural Language Queries for Visual Data Analytics
- URL: http://arxiv.org/abs/2005.03257v5
- Date: Thu, 3 Dec 2020 06:58:56 GMT
- Title: Quda: Natural Language Queries for Visual Data Analytics
- Authors: Siwei Fu, Kai Xiong, Xiaodong Ge, Siliang Tang, Wei Chen, Yingcai Wu
- Abstract summary: We present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language.
Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks.
This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks.
- Score: 33.983060903399554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The identification of analytic tasks from free text is critical for
visualization-oriented natural language interfaces (V-NLIs) to suggest
effective visualizations. However, it is challenging due to the ambiguity and
complexity nature of human language. To address this challenge, we present a
new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks
from free-form natural language by training and evaluating cutting-edge
multi-label classification models. Our dataset contains $14,035$ diverse user
queries, and each is annotated with one or multiple analytic tasks. We achieve
this goal by first gathering seed queries with data analysts and then employing
extensive crowd force for paraphrase generation and validation. We demonstrate
the usefulness of Quda through three applications. This work is the first
attempt to construct a large-scale corpus for recognizing analytic tasks. With
the release of Quda, we hope it will boost the research and development of
V-NLIs in data analysis and visualization.
Related papers
- Data Formulator 2: Iteratively Creating Rich Visualizations with AI [65.48447317310442]
We present Data Formulator 2, an LLM-powered visualization system to address these challenges.
With Data Formulator 2, users describe their visualization intent with blended UI and natural language inputs, and data transformation are delegated to AI.
To support iteration, Data Formulator 2 lets users navigate their iteration history and reuse previous designs towards new ones so that they don't need to start from scratch every time.
arXiv Detail & Related papers (2024-08-28T20:12:17Z) - VisEval: A Benchmark for Data Visualization in the Era of Large Language Models [12.077276008688065]
Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language.
In this paper, we propose a new NL2VIS benchmark called VisEval.
This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths.
arXiv Detail & Related papers (2024-07-01T05:35:30Z) - Prompt4Vis: Prompting Large Language Models with Example Mining and
Schema Filtering for Tabular Data Visualization [13.425454489560376]
We introduce Prompt4Vis, a framework for generating data visualization queries from natural language.
In-context learning is introduced into the text-to-vis for generating data visualization queries.
Prompt4Vis surpasses the state-of-the-art (SOTA) RGVisNet by approximately 35.9% and 71.3% on dev and test sets, respectively.
arXiv Detail & Related papers (2024-01-29T10:23:47Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - Automatic Data Visualization Generation from Chinese Natural Language
Questions [23.777512332679194]
We propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem.
Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning.
arXiv Detail & Related papers (2023-09-14T12:16:21Z) - A deep Natural Language Inference predictor without language-specific
training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset.
We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model.
The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Using Large Language Models to Generate Engaging Captions for Data
Visualizations [51.98253121636079]
Large language models (LLM) use sophisticated deep learning technology to produce human-like prose.
Key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering.
We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
arXiv Detail & Related papers (2022-12-27T23:56:57Z) - Unravelling Interlanguage Facts via Explainable Machine Learning [10.71581852108984]
We focus on the internals of an NLI classifier trained by an emphexplainable machine learning algorithm.
We use this perspective in order to tackle both NLI and a companion task, guessing whether a text has been written by a native or a non-native speaker.
We investigate which kind of linguistic traits are most effective for solving our two tasks, namely, are most indicative of a speaker's L1.
arXiv Detail & Related papers (2022-08-02T14:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.