Related papers: Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

URL: http://arxiv.org/abs/2411.04637v1
Date: Thu, 07 Nov 2024 11:51:14 GMT
Title: Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Authors: Ekaterina Artemova, Akim Tsvigun, Dominik Schlechtweg, Natalia Fedorova, Sergei Tilga, Boris Obmoroshev,
Abstract summary: This tutorial is designed for NLP practitioners from both research and industry backgrounds. We will present the basics of each strategy, highlight their benefits and limitations, and discuss in detail real-life case studies. The tutorial includes a hands-on workshop, where attendees will be guided in implementing a hybrid annotation setup.
Score: 7.925650087629884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training and deploying machine learning models relies on a large amount of human-annotated data. As human labeling becomes increasingly expensive and time-consuming, recent research has developed multiple strategies to speed up annotation and reduce costs and human workload: generating synthetic training data, active learning, and hybrid labeling. This tutorial is oriented toward practical applications: we will present the basics of each strategy, highlight their benefits and limitations, and discuss in detail real-life case studies. Additionally, we will walk through best practices for managing human annotators and controlling the quality of the final dataset. The tutorial includes a hands-on workshop, where attendees will be guided in implementing a hybrid annotation setup. This tutorial is designed for NLP practitioners from both research and industry backgrounds who are involved in or interested in optimizing data labeling projects.

Related papers

Have LLMs Made Active Learning Obsolete? Surveying the NLP Community [7.99984266570379]
Supervised learning relies on annotated data, which is expensive to obtain. Large language models have pushed the effectiveness of active learning, but have also improved methods such as few- or zero-shot learning. This raises the question: has active learning become obsolete?
arXiv Detail & Related papers (2025-03-12T18:00:04Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
A Survey on Deep Active Learning: Recent Advances and New Frontiers [27.07154361976248]
This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in deep learning-based active learning (DAL) This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce.
arXiv Detail & Related papers (2024-05-01T05:54:33Z)
Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis. It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data [1.8692054990918079]
We review three cheap' techniques that have developed in recent years: weak supervision, transfer learning and prompt engineering. For the latter, we review the particular case of zero-shot prompting of large language models. We show good performance for all techniques, and in particular we demonstrate how prompting of large language models can achieve high accuracy at very low cost.
arXiv Detail & Related papers (2024-01-22T19:00:11Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Tutorial on Deep Learning for Human Activity Recognition [70.94062293989832]
This tutorial was first held at the 2021 ACM International Symposium on Wearable Computers (ISWC'21) It provides a hands-on and interactive walk-through of the most important steps in the data pipeline for the deep learning of human activities.
arXiv Detail & Related papers (2021-10-13T12:01:02Z)
Motivating Learners in Multi-Orchestrator Mobile Edge Learning: A Stackelberg Game Approach [54.28419430315478]
Mobile Edge Learning enables distributed training of Machine Learning models over heterogeneous edge devices. In MEL, the training performance deteriorates without the availability of sufficient training data or computing resources. We propose an incentive mechanism, where we formulate the orchestrators-learners interactions as a 2-round Stackelberg game.
arXiv Detail & Related papers (2021-09-25T17:27:48Z)
Towards Zero-Label Language Learning [20.28186484098947]
This paper explores zero-label learning in Natural Language Processing (NLP) No human-annotated data is used anywhere during training and models are trained purely on synthetic data. Inspired by the recent success of few-shot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation.
arXiv Detail & Related papers (2021-09-19T19:00:07Z)
Self-supervised on Graphs: Contrastive, Generative,or Predictive [25.679620842010422]
Self-supervised learning (SSL) is emerging as a new paradigm for extracting informative knowledge through well-designed pretext tasks. We divide existing graph SSL methods into three categories: contrastive, generative, and predictive. We also summarize the commonly used datasets, evaluation metrics, downstream tasks, and open-source implementations of various algorithms.
arXiv Detail & Related papers (2021-05-16T03:30:03Z)
How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.