ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval
- URL: http://arxiv.org/abs/2308.02618v2
- Date: Fri, 3 Nov 2023 17:09:46 GMT
- Title: ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval
- Authors: Saipraneeth Devunuri, Shirin Qiam, Lewis Lehe
- Abstract summary: This research is to see if the current widely adopted LLMs (ChatGPT) are able to understand GTFS and retrieve information from GTFS using natural language instructions without explicitly providing information.
ChatGPT demonstrates a reasonable understanding of GTFS by answering 59.7% (GPT-3.5-Turbo) and 73.3% (GPT-4) of our multiple-choice questions (MCQ) correctly.
We found that program synthesis techniques outperformed zero-shot approaches, achieving up to 93% (90%) accuracy for simple queries and 61% (41%) for complex ones using GPT-4 (GPT-3.5-Turbo)
- Score: 2.3951780950929678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The General Transit Feed Specification (GTFS) standard for publishing transit
data is ubiquitous. GTFS being tabular data, with information spread across
different files, necessitates specialized tools or packages to retrieve
information. Concurrently, the use of Large Language Models(LLMs) for text and
information retrieval is growing. The idea of this research is to see if the
current widely adopted LLMs (ChatGPT) are able to understand GTFS and retrieve
information from GTFS using natural language instructions without explicitly
providing information. In this research, we benchmark OpenAI's GPT-3.5-Turbo
and GPT-4 LLMs which are the backbone of ChatGPT. ChatGPT demonstrates a
reasonable understanding of GTFS by answering 59.7% (GPT-3.5-Turbo) and 73.3%
(GPT-4) of our multiple-choice questions (MCQ) correctly. Furthermore, we
evaluated the LLMs on information extraction tasks using a filtered GTFS feed
containing four routes. We found that program synthesis techniques outperformed
zero-shot approaches, achieving up to 93% (90%) accuracy for simple queries and
61% (41%) for complex ones using GPT-4 (GPT-3.5-Turbo).
Related papers
- An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2024-08-31T07:10:16Z) - GPTZoo: A Large-scale Dataset of GPTs for the Research Community [5.1875389249043415]
GPTZoo is a large-scale dataset comprising 730,420 GPT instances.
Each instance includes rich metadata with 21 attributes describing its characteristics, as well as instructions, knowledge files, and third-party services utilized during its development.
arXiv Detail & Related papers (2024-05-24T15:17:03Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies [47.129504708849446]
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing.
LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution.
In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available.
arXiv Detail & Related papers (2024-02-27T10:44:52Z) - Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize
Encoded Knowledge [85.17343729885003]
We introduce Hint-before-Solving Prompting (HSP), which guides the model to generate hints for solving the problem.
HSP can effectively improve the accuracy of reasoning tasks.
We build the HSPMATH dataset based on HSP and fine-tuned Llemma-7B, reaching 64.3 accuracy.
arXiv Detail & Related papers (2024-02-22T05:58:03Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond [29.778018058541676]
GPT-Fathom is an open-source and reproducible evaluation suite for large language models (LLMs) built on top of OpenAI Evals.
We evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all aligned under settings.
arXiv Detail & Related papers (2023-09-28T16:43:35Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning [5.822010906632045]
This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research.
By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis.
arXiv Detail & Related papers (2023-07-05T10:15:07Z) - An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2023-05-23T18:17:43Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.