MaintNet: A Collaborative Open-Source Library for Predictive Maintenance
Language Resources
- URL: http://arxiv.org/abs/2005.12443v1
- Date: Mon, 25 May 2020 23:44:19 GMT
- Title: MaintNet: A Collaborative Open-Source Library for Predictive Maintenance
Language Resources
- Authors: Farhad Akhbardeh, Travis Desell, Marcos Zampieri
- Abstract summary: MaintNet is a collaborative open-source library of technical and domain-specific language datasets.
MaintNet provides novel logbook data from the aviation, automotive, and facilities domains.
- Score: 13.976220447055521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maintenance record logbooks are an emerging text type in NLP. They typically
consist of free text documents with many domain specific technical terms,
abbreviations, as well as non-standard spelling and grammar, which poses
difficulties to NLP pipelines trained on standard corpora. Analyzing and
annotating such documents is of particular importance in the development of
predictive maintenance systems, which aim to provide operational efficiencies,
prevent accidents and save lives. In order to facilitate and encourage research
in this area, we have developed MaintNet, a collaborative open-source library
of technical and domain-specific language datasets. MaintNet provides novel
logbook data from the aviation, automotive, and facilities domains along with
tools to aid in their (pre-)processing and clustering. Furthermore, it provides
a way to encourage discussion on and sharing of new datasets and tools for
logbook data analysis.
Related papers
- Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance [0.0]
Our work is driven by the need to manage and process large volumes of short text documents inherent in specific application domains.
By leveraging domain-specific knowledge and expertise, our approach aims to shape factual data within these domains.
Our work underscores the transformative potential of the partnership of domain-specific language models and graph-oriented databases.
arXiv Detail & Related papers (2024-10-04T19:02:09Z) - Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts.
Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models.
The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z) - Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Transforming Unstructured Text into Data with Context Rule Assisted
Machine Learning (CRAML) [0.0]
The Context Rule Assisted Machine Learning (CRAML) method allows accurate and reproducible labeling of massive volumes of unstructured text.
CRAML enables domain experts to access uncommon constructs buried within a document corpus.
We present three use cases for CRAML: we analyze recent management literature that draws from text data, describe and release new machine learning models from an analysis of proprietary job advertisement text, and present findings of social and economic interest from a public corpus of franchise documents.
arXiv Detail & Related papers (2023-01-20T13:12:35Z) - LAVIS: A Library for Language-Vision Intelligence [98.88477610704938]
LAVIS is an open-source library for LAnguage-VISion research and applications.
It features a unified interface to easily access state-of-the-art image-language, video-language models and common datasets.
arXiv Detail & Related papers (2022-09-15T18:04:10Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - A Flexible Clustering Pipeline for Mining Text Intentions [6.599344783327053]
We create a flexible and scalable clustering pipeline within the Verint Intent Manager.
It integrates the fine-tuning of language models, a high performing k-NN library and community detection techniques.
As deployed in the VIM application, this clustering pipeline produces high quality results.
arXiv Detail & Related papers (2022-02-01T22:54:18Z) - Case Studies on using Natural Language Processing Techniques in Customer
Relationship Management Software [0.0]
We trained word embeddings by using the corresponding text corpus and showed that these word embeddings can not only be used directly for data mining but also be used in RNN architectures.
The results prove that structured text data in a CRM can be used to mine out very valuable information.
arXiv Detail & Related papers (2021-06-09T16:07:07Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.