Related papers: A Data-Centric Framework for Composable NLP Workflows

A Data-Centric Framework for Composable NLP Workflows

URL: http://arxiv.org/abs/2103.01834v2
Date: Wed, 3 Mar 2021 02:57:35 GMT
Title: A Data-Centric Framework for Composable NLP Workflows
Authors: Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, and Zhiting Hu
Abstract summary: Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components. We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
Score: 109.51144493023533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner. The framework introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks. It offers a large repository of processors for NLP tasks, visualization, and annotation, which can be easily assembled with full interoperability under the unified representation. The highly extensible framework allows plugging in custom processors from external off-the-shelf NLP and deep learning libraries. The whole framework is delivered through two modularized yet integratable open-source projects, namely Forte1 (for workflow infrastructure and NLP function processors) and Stave2 (for user interaction, visualization, and annotation).

Related papers

Towards Conversational AI for Human-Machine Collaborative MLOps [0.17152709285783643]
This paper presents a Large Language Model (LLM) based conversational agent system designed to enhance human-machine collaboration in Machine Learning Operations (MLOps) We introduce the Swarm Agent, an architecture that integrates specialized agents to create and manage ML through natural language interactions. The paper describes the architecture, implementation details, and demonstrates how this conversational MLOps assistant reduces complexity and lowers to entry for users across diverse technical skill levels.
arXiv Detail & Related papers (2025-04-16T20:28:50Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding. Existing solutions often rely on task-specific architectures and objectives for individual tasks. In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
Sketch: A Toolkit for Streamlining LLM Operations [51.33202045501429]
Large language models (LLMs) have achieved remarkable success. The flexibility of their output format poses challenges in controlling and harnessing the model's outputs. We present Sketch, an innovative toolkit designed to streamline LLM operations across diverse fields.
arXiv Detail & Related papers (2024-09-05T08:45:44Z)
Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective. This paper presents a novel concept of context structurization. Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z)
Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled. First, we quantize and embed both text and visual prompt into a unified representational space. Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z)
A Composable Just-In-Time Programming Framework with LLMs and FBP [0.0]
This paper introduces a computing framework that combines Flow-Based Programming (FBP) and Large Language Models (LLMs) to enable Just-In-Time Programming (JITP) JITP empowers users, regardless of their programming expertise, to actively participate in the development and automation process by leveraging their task-time algorithmic insights. The framework allows users to request and generate code in real-time, enabling dynamic code execution within a flow-based program.
arXiv Detail & Related papers (2023-07-31T23:51:46Z)
NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools [6.197644109088143]
Non-expert users can obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia.
arXiv Detail & Related papers (2023-03-02T16:59:31Z)
HugNLP: A Unified and Comprehensive Library for Natural Language Processing [14.305751154503133]
We introduce HugNLP, a library for natural language processing (NLP) with the prevalent backend of HuggingFace Transformers. HugNLP consists of a hierarchical structure including models, processors and applications that unifies the learning process of pre-trained language models (PLMs) on different NLP tasks.
arXiv Detail & Related papers (2023-02-28T03:38:26Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z)
FedNLP: A Research Platform for Federated Learning in Natural Language Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP. FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling. Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.