IR Design for Application-Specific Natural Language: A Case Study on
Traffic Data
- URL: http://arxiv.org/abs/2307.06983v1
- Date: Thu, 13 Jul 2023 15:52:05 GMT
- Title: IR Design for Application-Specific Natural Language: A Case Study on
Traffic Data
- Authors: Wei Hu, Xuhong Wang, Ding Wang, Shengyue Yao, Zuqiu Mao, Li Li,
Fei-Yue Wang, Yilun Lin
- Abstract summary: We propose a design for an intermediate representation (IR) that caters to Application-Specific Natural Language (ASNL)
Our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data.
- Score: 29.50290358564987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of software applications in the transportation industry,
Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their
ease of use and various other benefits. With the ceaseless progress in computer
performance and the rapid development of large-scale models, the possibility of
programming using natural language in specified applications - referred to as
Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits
greater flexibility and freedom, which, in turn, leads to an increase in
computational complexity for parsing and a decrease in processing performance.
To tackle this issue, our paper advances a design for an intermediate
representation (IR) that caters to ASNL and can uniformly process
transportation data into graph data format, improving data processing
performance. Experimental comparisons reveal that in standard data query
operations, our proposed IR design can achieve a speed improvement of over
forty times compared to direct usage of standard XML format data.
Related papers
- ChainStream: An LLM-based Framework for Unified Synthetic Sensing [20.589289717423597]
We propose to use natural language as the unified interface to process personal data and sense user context.
Our work is inspired by large language models (LLMs) and other generative models.
To evaluate the performance of natural language-based context sensing, we create a benchmark that contains 133 context sensing tasks.
arXiv Detail & Related papers (2024-12-13T08:25:26Z) - Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust [0.0]
This work focuses on the development of a multilingual non-profit IR system for the Islamic domain.
By employing methods like continued pre-training for domain adaptation and language reduction to decrease model size, a lightweight multilingual retrieval model was prepared.
arXiv Detail & Related papers (2024-11-09T11:37:18Z) - AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models.
It identifies biases, proposes resolutions, and guides the model to self-debias its outputs.
This approach minimizes computational costs and preserves model performance.
arXiv Detail & Related papers (2024-03-01T00:02:37Z) - Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces [0.4043859792291222]
We propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort.
We provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions.
arXiv Detail & Related papers (2024-02-11T11:03:08Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - LLMs with User-defined Prompts as Generic Data Operators for Reliable
Data Processing [13.901862478287509]
We propose a new design pattern that large language models (LLMs) could work as a generic data operator (LLM-GDO)
In the LLM-GDO design pattern, user-defined prompts (UDPs) are used to represent the data processing logic rather than implementations with a specific programming language.
Fine-tuning LLMs with domain-specific data could enhance the performance on the domain-specific tasks which makes data processing knowledge-aware.
arXiv Detail & Related papers (2023-12-26T23:08:38Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.
We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.