Related papers: GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

URL: http://arxiv.org/abs/2104.03958v1
Date: Thu, 8 Apr 2021 17:58:03 GMT
Title: GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns
Authors: Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch, Francesca Toni
Abstract summary: We provide a Python library for GrASP, an algorithm for drawing patterns from textual data. The library is equipped with a web-based interface empowering human users to conveniently explore the data and the extracted patterns.
Score: 25.350957495556226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data exploration is an important step of every data science and machine learning project, including those involving textual data. We provide a Python library for GrASP, an existing algorithm for drawing patterns from textual data. The library is equipped with a web-based interface empowering human users to conveniently explore the data and the extracted patterns. We also demonstrate the use of the library in two settings (spam detection and argument mining) and discuss future deployments of the library, e.g., beyond textual data exploration.

Related papers

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
Chatting with Papers: A Hybrid Approach Using LLMs and Knowledge Graphs [3.68389405018277]
This demo paper reports on a new workflow textitGhostWriter that combines the use of Large Language Models and Knowledge Graphs to support navigation through collections.<n>Based on the tool-suite textitEverythingData at the backend, textitGhostWriter provides an interface that enables querying and chatting'' with a collection.
arXiv Detail & Related papers (2025-05-16T18:51:51Z)
News Signals: An NLP Library for Text and Time Series [3.850666668546735]
News Signals is an open-source library for building and using datasets where inputs are clusters of textual data. It supports diverse data science and NLP problem settings related to the prediction of time series behaviour.
arXiv Detail & Related papers (2023-12-18T18:02:41Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions [100.52917027038369]
We operationalize the task of recommending datasets given a short natural language description. To facilitate this task, we build the DataFinder dataset which consists of a larger automatically-constructed training set and a smaller expert-annotated evaluation set. This system, trained on the DataFinder dataset, finds more relevant search results than existing third-party dataset search engines.
arXiv Detail & Related papers (2023-05-26T05:22:36Z)
Reception Reader: Exploring Text Reuse in Early Modern British Publications [0.0]
The Reception Reader is a web tool for studying text reuse in the Early English Books Online (EEBO- TCP) and Eighteenth Century Collections Online (ECCO) data. We show examples of how the tool streamlines research and exploration tasks, and discuss the utility and limitations of the user interface along with its current data sources.
arXiv Detail & Related papers (2023-02-08T14:37:35Z)
TextBox 2.0: A Text Generation Library with Pre-trained Language Models [72.49946755856935]
This paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs) To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch.
arXiv Detail & Related papers (2022-12-26T03:50:36Z)
DeepShovel: An Online Collaborative Platform for Data Extraction in Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data. DeepShovel is a publicly-available AI-assisted data extraction system to support their needs. A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z)
Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP. The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z)
pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing. The library provides data generation, fitting, prediction, and cross-validation routines. pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration [4.8353738137338755]
giotto-tda is a Python library that integrates high-performance topological data analysis with machine learning. The library's ability to handle various types of data is rooted in a wide range of preprocessing techniques.
arXiv Detail & Related papers (2020-04-06T10:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.