Related papers: IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

URL: http://arxiv.org/abs/2307.06983v1
Date: Thu, 13 Jul 2023 15:52:05 GMT
Title: IR Design for Application-Specific Natural Language: A Case Study on Traffic Data
Authors: Wei Hu, Xuhong Wang, Ding Wang, Shengyue Yao, Zuqiu Mao, Li Li, Fei-Yue Wang, Yilun Lin
Abstract summary: We propose a design for an intermediate representation (IR) that caters to Application-Specific Natural Language (ASNL) Our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data.
Score: 29.50290358564987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the realm of software applications in the transportation industry, Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their ease of use and various other benefits. With the ceaseless progress in computer performance and the rapid development of large-scale models, the possibility of programming using natural language in specified applications - referred to as Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits greater flexibility and freedom, which, in turn, leads to an increase in computational complexity for parsing and a decrease in processing performance. To tackle this issue, our paper advances a design for an intermediate representation (IR) that caters to ASNL and can uniformly process transportation data into graph data format, improving data processing performance. Experimental comparisons reveal that in standard data query operations, our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data.

Related papers

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
ChainStream: An LLM-based Framework for Unified Synthetic Sensing [20.589289717423597]
We propose to use natural language as the unified interface to process personal data and sense user context. Our work is inspired by large language models (LLMs) and other generative models. To evaluate the performance of natural language-based context sensing, we create a benchmark that contains 133 context sensing tasks.
arXiv Detail & Related papers (2024-12-13T08:25:26Z)
Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust [0.0]
This work focuses on the development of a multilingual non-profit IR system for the Islamic domain. By employing methods like continued pre-training for domain adaptation and language reduction to decrease model size, a lightweight multilingual retrieval model was prepared.
arXiv Detail & Related papers (2024-11-09T11:37:18Z)
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models. It identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance.
arXiv Detail & Related papers (2024-03-01T00:02:37Z)
Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces [0.4043859792291222]
We propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort. We provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions.
arXiv Detail & Related papers (2024-02-11T11:03:08Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
LLMs with User-defined Prompts as Generic Data Operators for Reliable Data Processing [13.901862478287509]
We propose a new design pattern that large language models (LLMs) could work as a generic data operator (LLM-GDO) In the LLM-GDO design pattern, user-defined prompts (UDPs) are used to represent the data processing logic rather than implementations with a specific programming language. Fine-tuning LLMs with domain-specific data could enhance the performance on the domain-specific tasks which makes data processing knowledge-aware.
arXiv Detail & Related papers (2023-12-26T23:08:38Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore. We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data [37.542036032277466]
We introduce a technique for -simulation-to-real'' transfer in language understanding problems. Our approach matches or outperforms state-of-the-art models trained on natural language data in several domains.
arXiv Detail & Related papers (2020-04-28T16:41:00Z)
Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.