SciWING -- A Software Toolkit for Scientific Document Processing
- URL: http://arxiv.org/abs/2004.03807v2
- Date: Fri, 23 Oct 2020 07:27:01 GMT
- Title: SciWING -- A Software Toolkit for Scientific Document Processing
- Authors: Abhinav Ramesh Kashyap, Min-Yen Kan
- Abstract summary: SciWING provides access to pre-trained models for scientific document processing tasks.
It includes ready-to-use web and terminal-based applications and demonstrations.
- Score: 21.394568145639894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SciWING, an open-source software toolkit which provides access
to pre-trained models for scientific document processing tasks, inclusive of
citation string parsing and logical structure recovery. SciWING enables
researchers to rapidly experiment with different models by swapping and
stacking different modules. It also enables them declare and run models from a
configuration file. It enables researchers to perform production-ready transfer
learning from general, pre-trained transformers (i.e., BERT, SciBERT etc), and
aids development of end-user applications. It includes ready-to-use web and
terminal-based applications and demonstrations (Available from
http://sciwing.io).
Related papers
- Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs [15.610004991273005]
We present Collage, a tool designed for rapid prototyping, visualization, and evaluation of different information extraction models on scientific PDFs.
We enable both developers and users of NLP-based tools to inspect, debug, and better understand modeling pipelines by providing granular views of intermediate states of processing.
arXiv Detail & Related papers (2024-10-30T22:00:34Z) - Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping [0.0]
The Deep Fast Machine Learning Utils (DFMLU) library provides tools designed to automate and enhance aspects of machine learning processes.
DFMLU offers functionalities that support model development and data handling.
This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
arXiv Detail & Related papers (2024-09-14T21:39:17Z) - ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models [51.35570730554632]
ESPnet-SPK is a toolkit for training speaker embedding extractors.
We provide several models, ranging from x-vector to recent SKA-TDNN.
We also aspire to bridge developed models with other domains.
arXiv Detail & Related papers (2024-01-30T18:18:27Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data
Format [88.33443450434521]
Task-oriented dialogue (TOD) systems function as digital assistants, guiding users through various tasks such as booking flights or finding restaurants.
Existing toolkits for building TOD systems often fall short of in delivering comprehensive arrays of data, models, and experimental environments.
We introduce ConvLab-3: a multifaceted dialogue system toolkit crafted to bridge this gap.
arXiv Detail & Related papers (2022-11-30T16:37:42Z) - SIERRA: A Modular Framework for Research Automation and Reproducibility [6.1678491628787455]
We present SIERRA, a novel framework for accelerating research development and improving results.
SIERRA accelerates research by automating the process of generating executable experiments from queries over independent variables.
It employs a modular architecture enabling easy customization and extension for the needs of individual researchers.
arXiv Detail & Related papers (2022-08-16T15:36:34Z) - Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval [60.457378374671656]
Tevatron is a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity.
We show how Tevatron's flexible design enables easy generalization across datasets, model architectures, and accelerator platforms.
arXiv Detail & Related papers (2022-03-11T05:47:45Z) - SIERRA: A Modular Framework for Research Automation [5.220940151628734]
We present SIERRA, a novel framework for accelerating research developments and improving results.
SIERRA makes it easy to quickly specify the independent variable(s) for an experiment, generate experimental inputs, automatically run the experiment, and process the results to generate deliverables such as graphs and videos.
It employs a deeply modular approach that allows easy customization and extension of automation for the needs of individual researchers.
arXiv Detail & Related papers (2022-03-03T23:45:46Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform
for NLP Applications [65.87067607849757]
EasyTransfer is a platform to develop deep Transfer Learning algorithms for Natural Language Processing (NLP) applications.
EasyTransfer supports various NLP models in the ModelZoo, including mainstream PLMs and multi-modality models.
EasyTransfer is currently deployed at Alibaba to support a variety of business scenarios.
arXiv Detail & Related papers (2020-11-18T18:41:27Z) - Collective Knowledge: organizing research projects as a database of
reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge)
The CK concept is to decompose research projects into reusable components that encapsulate research artifacts.
The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.