Related papers: NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

URL: http://arxiv.org/abs/2303.01410v1
Date: Thu, 2 Mar 2023 16:59:31 GMT
Title: NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Authors: Peiran Yao, Matej Kosmajac, Abeer Waheed, Kostyantyn Guzhva, Natalie Hervieux, Denilson Barbosa
Abstract summary: Non-expert users can obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia.
Score: 6.197644109088143
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Workbench and the benefits of using it over other approaches. The platform is under active development, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.

Related papers

Secure Edge Computing Reference Architecture for Data-driven Structural Health Monitoring: Lessons Learned from Implementation and Benchmarking [0.0]
This paper introduces a scalable and secure edge-computing reference architecture tailored for data-driven Structural Health Monitoring (SHM) Our solution integrates a commercial data acquisition system with off-the-shelf hardware running an open-source edge-computing platform, remotely managed and scaled through cloud services. We study this framework by collecting resource utilization data for machine learning models typically used in SHM applications on two different edge computing hardware platforms.
arXiv Detail & Related papers (2025-03-24T16:33:25Z)
Establishing tool support for a concept DSL [0.0]
This thesis describes Conceptual, a DSL for modeling the behavior of software systems using self-contained and highly reusable units of concepts. The suggested strategy is then implemented with a simple compiler, allowing developers to access and utilize Alloy's existing analysis tools for program reasoning.
arXiv Detail & Related papers (2025-03-07T09:18:31Z)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z)
Institutional Platform for Secure Self-Service Large Language Model Exploration [0.0]
The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction. The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication.
arXiv Detail & Related papers (2024-02-01T10:58:10Z)
Mini-GPTs: Efficient Large Language Models through Contextual Pruning [0.0]
This paper introduces a novel approach in developing Mini-GPTs via contextual pruning. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles.
arXiv Detail & Related papers (2023-12-20T00:48:13Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components. We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z)
Modular approach to data preprocessing in ALOHA and application to a smart industry use case [0.0]
The paper addresses a modular approach, integrated into the ALOHA tool flow, to support the data preprocessing and transformation pipeline. To demonstrate the effectiveness of the approach, we present some experimental results related to a keyword spotting use case.
arXiv Detail & Related papers (2021-02-02T06:48:51Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks. textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z)
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes [2.750124853532831]
CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph library for CLEVR that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. We discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
arXiv Detail & Related papers (2020-09-19T03:32:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.