A Data-Centric Framework for Composable NLP Workflows
- URL: http://arxiv.org/abs/2103.01834v2
- Date: Wed, 3 Mar 2021 02:57:35 GMT
- Title: A Data-Centric Framework for Composable NLP Workflows
- Authors: Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi
Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei,
Zecong Hu, Haoran Shi, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, and
Zhiting Hu
- Abstract summary: Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
- Score: 109.51144493023533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empirical natural language processing (NLP) systems in application domains
(e.g., healthcare, finance, education) involve interoperation among multiple
components, ranging from data ingestion, human annotation, to text retrieval,
analysis, generation, and visualization. We establish a unified open-source
framework to support fast development of such sophisticated NLP workflows in a
composable manner. The framework introduces a uniform data representation to
encode heterogeneous results by a wide range of NLP tasks. It offers a large
repository of processors for NLP tasks, visualization, and annotation, which
can be easily assembled with full interoperability under the unified
representation. The highly extensible framework allows plugging in custom
processors from external off-the-shelf NLP and deep learning libraries. The
whole framework is delivered through two modularized yet integratable
open-source projects, namely Forte1 (for workflow infrastructure and NLP
function processors) and Stave2 (for user interaction, visualization, and
annotation).
Related papers
- Sketch: A Toolkit for Streamlining LLM Operations [51.33202045501429]
Large language models (LLMs) have achieved remarkable success.
The flexibility of their output format poses challenges in controlling and harnessing the model's outputs.
We present Sketch, an innovative toolkit designed to streamline LLM operations across diverse fields.
arXiv Detail & Related papers (2024-09-05T08:45:44Z) - Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - A Composable Just-In-Time Programming Framework with LLMs and FBP [0.0]
This paper introduces a computing framework that combines Flow-Based Programming (FBP) and Large Language Models (LLMs) to enable Just-In-Time Programming (JITP)
JITP empowers users, regardless of their programming expertise, to actively participate in the development and automation process by leveraging their task-time algorithmic insights.
The framework allows users to request and generate code in real-time, enabling dynamic code execution within a flow-based program.
arXiv Detail & Related papers (2023-07-31T23:51:46Z) - NLP Workbench: Efficient and Extensible Integration of State-of-the-art
Text Mining Tools [6.197644109088143]
Non-expert users can obtain semantic understanding of large-scale corpora using state-of-the-art text mining models.
The platform is built upon latest pre-trained models and open source systems from academia.
arXiv Detail & Related papers (2023-03-02T16:59:31Z) - HugNLP: A Unified and Comprehensive Library for Natural Language
Processing [14.305751154503133]
We introduce HugNLP, a library for natural language processing (NLP) with the prevalent backend of HuggingFace Transformers.
HugNLP consists of a hierarchical structure including models, processors and applications that unifies the learning process of pre-trained language models (PLMs) on different NLP tasks.
arXiv Detail & Related papers (2023-02-28T03:38:26Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z) - FedNLP: A Research Platform for Federated Learning in Natural Language
Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP.
FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling.
Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.