GrASP: A Library for Extracting and Exploring Human-Interpretable
Textual Patterns
- URL: http://arxiv.org/abs/2104.03958v1
- Date: Thu, 8 Apr 2021 17:58:03 GMT
- Title: GrASP: A Library for Extracting and Exploring Human-Interpretable
Textual Patterns
- Authors: Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch, Francesca
Toni
- Abstract summary: We provide a Python library for GrASP, an algorithm for drawing patterns from textual data.
The library is equipped with a web-based interface empowering human users to conveniently explore the data and the extracted patterns.
- Score: 25.350957495556226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data exploration is an important step of every data science and machine
learning project, including those involving textual data. We provide a Python
library for GrASP, an existing algorithm for drawing patterns from textual
data. The library is equipped with a web-based interface empowering human users
to conveniently explore the data and the extracted patterns. We also
demonstrate the use of the library in two settings (spam detection and argument
mining) and discuss future deployments of the library, e.g., beyond textual
data exploration.
Related papers
- News Signals: An NLP Library for Text and Time Series [3.850666668546735]
News Signals is an open-source library for building and using datasets where inputs are clusters of textual data.
It supports diverse data science and NLP problem settings related to the prediction of time series behaviour.
arXiv Detail & Related papers (2023-12-18T18:02:41Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - DataFinder: Scientific Dataset Recommendation from Natural Language
Descriptions [100.52917027038369]
We operationalize the task of recommending datasets given a short natural language description.
To facilitate this task, we build the DataFinder dataset which consists of a larger automatically-constructed training set and a smaller expert-annotated evaluation set.
This system, trained on the DataFinder dataset, finds more relevant search results than existing third-party dataset search engines.
arXiv Detail & Related papers (2023-05-26T05:22:36Z) - Reception Reader: Exploring Text Reuse in Early Modern British
Publications [0.0]
The Reception Reader is a web tool for studying text reuse in the Early English Books Online (EEBO- TCP) and Eighteenth Century Collections Online (ECCO) data.
We show examples of how the tool streamlines research and exploration tasks, and discuss the utility and limitations of the user interface along with its current data sources.
arXiv Detail & Related papers (2023-02-08T14:37:35Z) - TextBox 2.0: A Text Generation Library with Pre-trained Language Models [72.49946755856935]
This paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs)
To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets.
We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch.
arXiv Detail & Related papers (2022-12-26T03:50:36Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing.
The library provides data generation, fitting, prediction, and cross-validation routines.
pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and
Data Exploration [4.8353738137338755]
giotto-tda is a Python library that integrates high-performance topological data analysis with machine learning.
The library's ability to handle various types of data is rooted in a wide range of preprocessing techniques.
arXiv Detail & Related papers (2020-04-06T10:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.