APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
- URL: http://arxiv.org/abs/2402.01697v4
- Date: Tue, 20 Feb 2024 07:54:12 GMT
- Title: APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
- Authors: Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan
Hui
- Abstract summary: We propose APT-Pipe, an automated prompt-tuning pipeline.
We test it across twelve distinct text classification datasets.
We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets.
- Score: 28.976911675881826
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent research has highlighted the potential of LLM applications, like
ChatGPT, for performing label annotation on social computing text. However, it
is already well known that performance hinges on the quality of the input
prompts. To address this, there has been a flurry of research into prompt
tuning -- techniques and guidelines that attempt to improve the quality of
prompts. Yet these largely rely on manual effort and prior knowledge of the
dataset being annotated. To address this limitation, we propose APT-Pipe, an
automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts
to enhance ChatGPT's text classification performance on any given dataset. We
implement APT-Pipe and test it across twelve distinct text classification
datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher
weighted F1-score on nine out of twelve experimented datasets, with an
improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as
a framework by showing how it can be extended to support additional tuning
mechanisms.
Related papers
- Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version) [26.643834593780007]
We investigate the extent to which ChatGPT can annotate data for social computing tasks.
ChatGPT exhibits promise in handling data annotation tasks, albeit with some challenges.
We propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task.
arXiv Detail & Related papers (2024-07-08T22:04:30Z) - Towards Automating Text Annotation: A Case Study on Semantic Proximity Annotation using GPT-4 [4.40960504549418]
This paper reuses human annotation guidelines along with some annotated data to design automatic prompts.
We implement the prompting strategies into an open-source text annotation tool, enabling easy online use via the OpenAI API.
arXiv Detail & Related papers (2024-07-04T19:16:44Z) - GPTZoo: A Large-scale Dataset of GPTs for the Research Community [5.1875389249043415]
GPTZoo is a large-scale dataset comprising 730,420 GPT instances.
Each instance includes rich metadata with 21 attributes describing its characteristics, as well as instructions, knowledge files, and third-party services utilized during its development.
arXiv Detail & Related papers (2024-05-24T15:17:03Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z) - Evaluating ChatGPT's Information Extraction Capabilities: An Assessment
of Performance, Explainability, Calibration, and Faithfulness [18.945934162722466]
We focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks.
ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting.
ChatGPT provides high-quality and trustworthy explanations for its decisions.
arXiv Detail & Related papers (2023-04-23T12:33:18Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning [28.29889045842277]
Multitask prompted learning can help generalization through a diverse set of tasks at once.
We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning.
arXiv Detail & Related papers (2022-12-21T11:18:09Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - PTR: Prompt Tuning with Rules for Text Classification [64.1655047016891]
Fine-tuned pre-trained language models (PLMs) have achieved awesome performance on almost all NLP tasks.
We propose prompt tuning with rules (PTR) for many-class text classification.
PTR is able to encode prior knowledge of each class into prompt tuning.
arXiv Detail & Related papers (2021-05-24T13:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.