GPTZoo: A Large-scale Dataset of GPTs for the Research Community
- URL: http://arxiv.org/abs/2405.15630v1
- Date: Fri, 24 May 2024 15:17:03 GMT
- Title: GPTZoo: A Large-scale Dataset of GPTs for the Research Community
- Authors: Xinyi Hou, Yanjie Zhao, Shenao Wang, Haoyu Wang,
- Abstract summary: GPTZoo is a large-scale dataset comprising 730,420 GPT instances.
Each instance includes rich metadata with 21 attributes describing its characteristics, as well as instructions, knowledge files, and third-party services utilized during its development.
- Score: 5.1875389249043415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancements in Large Language Models (LLMs) have revolutionized natural language processing, with GPTs, customized versions of ChatGPT available on the GPT Store, emerging as a prominent technology for specific domains and tasks. To support academic research on GPTs, we introduce GPTZoo, a large-scale dataset comprising 730,420 GPT instances. Each instance includes rich metadata with 21 attributes describing its characteristics, as well as instructions, knowledge files, and third-party services utilized during its development. GPTZoo aims to provide researchers with a comprehensive and readily available resource to study the real-world applications, performance, and potential of GPTs. To facilitate efficient retrieval and analysis of GPTs, we also developed an automated command-line interface (CLI) that supports keyword-based searching of the dataset. To promote open research and innovation, the GPTZoo dataset will undergo continuous updates, and we are granting researchers public access to GPTZoo and its associated tools.
Related papers
- FAIR GPT: A virtual consultant for research data management in ChatGPT [0.0]
FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR principles.
It provides guidance on metadata improvement, dataset organization, and repository selection.
This paper describes its features, applications, and limitations.
arXiv Detail & Related papers (2024-09-20T12:28:48Z) - GPT-generated Text Detection: Benchmark Dataset and Tensor-based
Detection Method [4.802604527842989]
We present GPT Reddit dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset.
The dataset consists of context-prompt pairs based on Reddit with human-generated and ChatGPT-generated responses.
To showcase the dataset's utility, we benchmark several detection methods on it, demonstrating their efficacy in distinguishing between human and ChatGPT-generated responses.
arXiv Detail & Related papers (2024-03-12T05:15:21Z) - MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation [73.81268591484198]
Embodied agents equipped with GPT have exhibited extraordinary decision-making and generalization abilities across various tasks.
We present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage global exploration.
Benefiting from this design, we propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map.
arXiv Detail & Related papers (2024-01-14T15:34:48Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z) - SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its
Departure from Current Machine Learning [5.177947445379688]
This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis.
Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification.
The research yields detailed comparative insights among these strategies and individual GPT models, revealing their unique strengths and potential limitations.
arXiv Detail & Related papers (2023-07-16T05:33:35Z) - On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing [10.534162347659514]
We develop a deep neural framework named CheckGPT to better capture the subtle and deep semantic and linguistic patterns in ChatGPT written literature.
To evaluate the detectability of ChatGPT content, we conduct extensive experiments on the transferability, prompt engineering, and robustness of CheckGPT.
arXiv Detail & Related papers (2023-06-07T12:33:24Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Geotechnical Parrot Tales (GPT): Harnessing Large Language Models in
geotechnical engineering [2.132096006921048]
GPT models can generate plausible-sounding but false outputs, leading to hallucinations.
By integrating GPT into geotechnical engineering, professionals can streamline their work and develop sustainable and resilient infrastructure systems.
arXiv Detail & Related papers (2023-04-04T21:47:41Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.