MEGAnno: Exploratory Labeling for NLP in Computational Notebooks
- URL: http://arxiv.org/abs/2301.03095v1
- Date: Sun, 8 Jan 2023 19:16:22 GMT
- Title: MEGAnno: Exploratory Labeling for NLP in Computational Notebooks
- Authors: Dan Zhang, Hannah Kim, Rafael Li Chen, Eser Kandogan, Estevam Hruschka
- Abstract summary: We present MEGAnno, a novel annotation framework designed for NLP practitioners and researchers.
With MEGAnno, users can explore data through sophisticated search and interactive suggestion functions.
We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
- Score: 9.462926987075122
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present MEGAnno, a novel exploratory annotation framework designed for NLP
researchers and practitioners. Unlike existing labeling tools that focus on
data labeling only, our framework aims to support a broader, iterative ML
workflow including data exploration and model development. With MEGAnno's API,
users can programmatically explore the data through sophisticated search and
automated suggestion functions and incrementally update task schema as their
project evolve. Combined with our widget, the users can interactively sort,
filter, and assign labels to multiple items simultaneously in the same notebook
where the rest of the NLP project resides. We demonstrate MEGAnno's flexible,
exploratory, efficient, and seamless labeling experience through a sentiment
analysis use case.
Related papers
- Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning [61.00359941983515]
Multi-instance partial-label learning (MIPL) addresses scenarios where each training sample is represented as a multi-instance bag associated with a candidate label set containing one true label and several false positives.
ELIMIPL exploits the conjugate label information to improve the disambiguation performance.
arXiv Detail & Related papers (2024-08-26T15:49:31Z) - TnT-LLM: Text Mining at Scale with Large Language Models [24.731544646232962]
Large Language Models (LLMs) automate the process of end-to-end label generation and assignment with minimal human effort.
We show that TnT-LLM generates more accurate and relevant label when compared against state-of-the-art baselines.
We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.
arXiv Detail & Related papers (2024-03-18T18:45:28Z) - Lightweight Syntactic API Usage Analysis with UCov [0.0]
We present a novel conceptual framework designed to assist library maintainers in understanding the interactions allowed by their APIs.
These customizable models enable library maintainers to improve their design ahead of release, reducing friction during evolution.
We implement these models for Java libraries in a new tool UCov and demonstrate its capabilities on three libraries exhibiting diverse styles of interaction.
arXiv Detail & Related papers (2024-02-19T10:33:41Z) - The Shifted and The Overlooked: A Task-oriented Investigation of
User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT.
We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - MONAI Label: A framework for AI-assisted Interactive Labeling of 3D
Medical Images [49.664220687980006]
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models.
We present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models.
arXiv Detail & Related papers (2022-03-23T12:33:11Z) - Automatic Synthesis of Diverse Weak Supervision Sources for Behavior
Analysis [37.077883083886114]
AutoSWAP is a framework for automatically synthesizing data-efficient task-level labeling functions.
We show that AutoSWAP is an effective way to automatically generate labeling functions that can significantly reduce expert effort for behavior analysis.
arXiv Detail & Related papers (2021-11-30T07:51:12Z) - TagRuler: Interactive Tool for Span-Level Data Programming by
Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program.
We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.