FAIR GPT: A virtual consultant for research data management in ChatGPT
- URL: http://arxiv.org/abs/2410.07108v1
- Date: Fri, 20 Sep 2024 12:28:48 GMT
- Title: FAIR GPT: A virtual consultant for research data management in ChatGPT
- Authors: Renat Shigapov, Irene Schumm,
- Abstract summary: FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR principles.
It provides guidance on metadata improvement, dataset organization, and repository selection.
This paper describes its features, applications, and limitations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection. To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.
Related papers
- Extending and Applying Automated HERMES Software Publication Workflows [0.6157382820537718]
HERMES is a tool that automates the publication of software with rich metadata.
We show how to use HERMES as an end user, both via the informal command line interface, and as a step in a continuous integration pipeline.
arXiv Detail & Related papers (2024-10-23T07:11:48Z) - AutoFAIR : Automatic Data FAIRification via Machine Reading [28.683653852643015]
We propose AutoFAIR, an architecture designed to enhance data FAIRness automately.
We align each data and metadata operation with specific FAIR indicators to guide machine-executable actions.
We observe significant improvements in findability, accessibility, interoperability, and reusability of data.
arXiv Detail & Related papers (2024-08-07T17:36:58Z) - GPTZoo: A Large-scale Dataset of GPTs for the Research Community [5.1875389249043415]
GPTZoo is a large-scale dataset comprising 730,420 GPT instances.
Each instance includes rich metadata with 21 attributes describing its characteristics, as well as instructions, knowledge files, and third-party services utilized during its development.
arXiv Detail & Related papers (2024-05-24T15:17:03Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT [28.976911675881826]
We propose APT-Pipe, an automated prompt-tuning pipeline.
We test it across twelve distinct text classification datasets.
We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets.
arXiv Detail & Related papers (2024-01-24T10:09:11Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval [2.3951780950929678]
This research is to see if the current widely adopted LLMs (ChatGPT) are able to understand GTFS and retrieve information from GTFS using natural language instructions without explicitly providing information.
ChatGPT demonstrates a reasonable understanding of GTFS by answering 59.7% (GPT-3.5-Turbo) and 73.3% (GPT-4) of our multiple-choice questions (MCQ) correctly.
We found that program synthesis techniques outperformed zero-shot approaches, achieving up to 93% (90%) accuracy for simple queries and 61% (41%) for complex ones using GPT-4 (GPT-3.5-Turbo)
arXiv Detail & Related papers (2023-08-04T14:50:37Z) - Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z) - FETA: Towards Specializing Foundation Models for Expert Task
Applications [49.57393504125937]
Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization.
We show in this paper that FMs still have poor out-of-the-box performance on expert tasks.
We propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation.
arXiv Detail & Related papers (2022-09-08T08:47:57Z) - Nine Best Practices for Research Software Registries and Repositories: A
Concise Guide [63.52960372153386]
We present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories.
These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Implementation Working Group during the years 2011 and 2012.
arXiv Detail & Related papers (2020-12-24T05:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.