MaintNet: A Collaborative Open-Source Library for Predictive Maintenance
Language Resources
- URL: http://arxiv.org/abs/2005.12443v1
- Date: Mon, 25 May 2020 23:44:19 GMT
- Title: MaintNet: A Collaborative Open-Source Library for Predictive Maintenance
Language Resources
- Authors: Farhad Akhbardeh, Travis Desell, Marcos Zampieri
- Abstract summary: MaintNet is a collaborative open-source library of technical and domain-specific language datasets.
MaintNet provides novel logbook data from the aviation, automotive, and facilities domains.
- Score: 13.976220447055521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maintenance record logbooks are an emerging text type in NLP. They typically
consist of free text documents with many domain specific technical terms,
abbreviations, as well as non-standard spelling and grammar, which poses
difficulties to NLP pipelines trained on standard corpora. Analyzing and
annotating such documents is of particular importance in the development of
predictive maintenance systems, which aim to provide operational efficiencies,
prevent accidents and save lives. In order to facilitate and encourage research
in this area, we have developed MaintNet, a collaborative open-source library
of technical and domain-specific language datasets. MaintNet provides novel
logbook data from the aviation, automotive, and facilities domains along with
tools to aid in their (pre-)processing and clustering. Furthermore, it provides
a way to encourage discussion on and sharing of new datasets and tools for
logbook data analysis.
Related papers
- Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - What's in a Name? Evaluating Assembly-Part Semantic Knowledge in
Language Models through User-Provided Names in CAD Files [4.387757291346397]
We propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge.
In particular we extract and clean a large corpus of natural language part, feature and document names.
We show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data.
arXiv Detail & Related papers (2023-04-25T12:30:01Z) - Transforming Unstructured Text into Data with Context Rule Assisted
Machine Learning (CRAML) [0.0]
The Context Rule Assisted Machine Learning (CRAML) method allows accurate and reproducible labeling of massive volumes of unstructured text.
CRAML enables domain experts to access uncommon constructs buried within a document corpus.
We present three use cases for CRAML: we analyze recent management literature that draws from text data, describe and release new machine learning models from an analysis of proprietary job advertisement text, and present findings of social and economic interest from a public corpus of franchise documents.
arXiv Detail & Related papers (2023-01-20T13:12:35Z) - LAVIS: A Library for Language-Vision Intelligence [98.88477610704938]
LAVIS is an open-source library for LAnguage-VISion research and applications.
It features a unified interface to easily access state-of-the-art image-language, video-language models and common datasets.
arXiv Detail & Related papers (2022-09-15T18:04:10Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - A Flexible Clustering Pipeline for Mining Text Intentions [6.599344783327053]
We create a flexible and scalable clustering pipeline within the Verint Intent Manager.
It integrates the fine-tuning of language models, a high performing k-NN library and community detection techniques.
As deployed in the VIM application, this clustering pipeline produces high quality results.
arXiv Detail & Related papers (2022-02-01T22:54:18Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Case Studies on using Natural Language Processing Techniques in Customer
Relationship Management Software [0.0]
We trained word embeddings by using the corresponding text corpus and showed that these word embeddings can not only be used directly for data mining but also be used in RNN architectures.
The results prove that structured text data in a CRM can be used to mine out very valuable information.
arXiv Detail & Related papers (2021-06-09T16:07:07Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.