NLP Workbench: Efficient and Extensible Integration of State-of-the-art
Text Mining Tools
- URL: http://arxiv.org/abs/2303.01410v1
- Date: Thu, 2 Mar 2023 16:59:31 GMT
- Title: NLP Workbench: Efficient and Extensible Integration of State-of-the-art
Text Mining Tools
- Authors: Peiran Yao, Matej Kosmajac, Abeer Waheed, Kostyantyn Guzhva, Natalie
Hervieux, Denilson Barbosa
- Abstract summary: Non-expert users can obtain semantic understanding of large-scale corpora using state-of-the-art text mining models.
The platform is built upon latest pre-trained models and open source systems from academia.
- Score: 6.197644109088143
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: NLP Workbench is a web-based platform for text mining that allows non-expert
users to obtain semantic understanding of large-scale corpora using
state-of-the-art text mining models. The platform is built upon latest
pre-trained models and open source systems from academia that provide semantic
analysis functionalities, including but not limited to entity linking,
sentiment analysis, semantic parsing, and relation extraction. Its extensible
design enables researchers and developers to smoothly replace an existing model
or integrate a new one. To improve efficiency, we employ a microservice
architecture that facilitates allocation of acceleration hardware and
parallelization of computation. This paper presents the architecture of NLP
Workbench and discusses the challenges we faced in designing it. We also
discuss diverse use cases of NLP Workbench and the benefits of using it over
other approaches. The platform is under active development, with its source
code released under the MIT license. A website and a short video demonstrating
our platform are also available.
Related papers
- CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - Institutional Platform for Secure Self-Service Large Language Model Exploration [0.0]
The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction.
The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication.
arXiv Detail & Related papers (2024-02-01T10:58:10Z) - Mini-GPTs: Efficient Large Language Models through Contextual Pruning [0.0]
This paper introduces a novel approach in developing Mini-GPTs via contextual pruning.
We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles.
arXiv Detail & Related papers (2023-12-20T00:48:13Z) - SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant.
It generates high-quality instruction-based data for the domain of software engineering.
It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z) - Modular approach to data preprocessing in ALOHA and application to a
smart industry use case [0.0]
The paper addresses a modular approach, integrated into the ALOHA tool flow, to support the data preprocessing and transformation pipeline.
To demonstrate the effectiveness of the approach, we present some experimental results related to a keyword spotting use case.
arXiv Detail & Related papers (2021-02-02T06:48:51Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - CLEVR Parser: A Graph Parser Library for Geometric Learning on Language
Grounded Image Scenes [2.750124853532831]
CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains.
We present a graph library for CLEVR that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities.
We discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
arXiv Detail & Related papers (2020-09-19T03:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.