Automatic Data Visualization Generation from Chinese Natural Language
Questions
- URL: http://arxiv.org/abs/2309.07650v1
- Date: Thu, 14 Sep 2023 12:16:21 GMT
- Title: Automatic Data Visualization Generation from Chinese Natural Language
Questions
- Authors: Yan Ge and Victor Junqiu Wei and Yuanfeng Song and Jason Chen Zhang
and Raymond Chi-Wing Wong
- Abstract summary: We propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem.
Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning.
- Score: 23.777512332679194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data visualization has emerged as an effective tool for getting insights from
massive datasets. Due to the hardness of manipulating the programming languages
of data visualization, automatic data visualization generation from natural
languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora
of research effort on the English Text-to-Vis, studies have yet to be conducted
on data visualization generation from questions in Chinese. Motivated by this,
we propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first
attempt to tackle this problem. Our model integrates multilingual BERT as the
encoder, boosts the cross-lingual ability, and infuses the $n$-gram information
into our word representation learning. Our experimental results show that our
dataset is challenging and deserves further research.
Related papers
- Open the Data! Chuvash Datasets [50.59120569845975]
We introduce four comprehensive datasets for the Chuvash language.
These datasets include a monolingual dataset, a parallel dataset with Russian, a parallel dataset with English, and an audio dataset.
arXiv Detail & Related papers (2024-05-31T07:51:19Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - 3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset [90.95948101052073]
We introduce 3AM, an ambiguity-aware MMT dataset comprising 26,000 parallel sentence pairs in English and Chinese.
Our dataset is specifically designed to include more ambiguity and a greater variety of both captions and images than other MMT datasets.
Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets.
arXiv Detail & Related papers (2024-04-29T04:01:30Z) - TEXTRON: Weakly Supervised Multilingual Text Detection through Data
Programming [21.88026116276415]
Text detection is a challenging problem in the field of computer vision (CV)
There is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts.
We propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework.
arXiv Detail & Related papers (2024-02-15T09:18:18Z) - Expand BERT Representation with Visual Information via Grounded Language
Learning with Multimodal Partial Alignment [11.148099070407431]
GroundedBERT is a grounded language learning method that enhances the BERT representation with visually grounded information.
Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
arXiv Detail & Related papers (2023-12-04T03:16:48Z) - Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey [30.836162812277085]
The rise of large language models (LLMs) has further advanced this field, opening new avenues for natural language processing techniques.
We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing.
This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements.
arXiv Detail & Related papers (2023-10-27T05:01:20Z) - Using Large Language Models to Generate Engaging Captions for Data
Visualizations [51.98253121636079]
Large language models (LLM) use sophisticated deep learning technology to produce human-like prose.
Key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering.
We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
arXiv Detail & Related papers (2022-12-27T23:56:57Z) - From Two to One: A New Scene Text Recognizer with Visual Language
Modeling Network [70.47504933083218]
We propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union.
VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition.
arXiv Detail & Related papers (2021-08-22T07:56:24Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z) - Quda: Natural Language Queries for Visual Data Analytics [33.983060903399554]
We present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language.
Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks.
This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks.
arXiv Detail & Related papers (2020-05-07T05:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.