Extracting Semantics from Maintenance Records
- URL: http://arxiv.org/abs/2108.05454v1
- Date: Wed, 11 Aug 2021 21:23:10 GMT
- Title: Extracting Semantics from Maintenance Records
- Authors: Sharad Dixit, Varish Mulwad, Abhinav Saxena
- Abstract summary: We develop three approaches to extracting named entity recognition from maintenance records.
We develop a syntactic rules and semantic-based approach and an approach leveraging a pre-trained language model.
Our evaluations on a real-world aviation maintenance records dataset show promising results.
- Score: 0.2578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid progress in natural language processing has led to its utilization in a
variety of industrial and enterprise settings, including in its use for
information extraction, specifically named entity recognition and relation
extraction, from documents such as engineering manuals and field maintenance
reports. While named entity recognition is a well-studied problem, existing
state-of-the-art approaches require large labelled datasets which are hard to
acquire for sensitive data such as maintenance records. Further, industrial
domain experts tend to distrust results from black box machine learning models,
especially when the extracted information is used in downstream predictive
maintenance analytics. We overcome these challenges by developing three
approaches built on the foundation of domain expert knowledge captured in
dictionaries and ontologies. We develop a syntactic and semantic rules-based
approach and an approach leveraging a pre-trained language model, fine-tuned
for a question-answering task on top of our base dictionary lookup to extract
entities of interest from maintenance records. We also develop a preliminary
ontology to represent and capture the semantics of maintenance records. Our
evaluations on a real-world aviation maintenance records dataset show promising
results and help identify challenges specific to named entity recognition in
the context of noisy industrial data.
Related papers
- Benchmarking pre-trained text embedding models in aligning built asset information [0.0]
This study presents a comparative benchmark of state-of-the-art text embedding models to evaluate their effectiveness in aligning built asset information with domain-specific technical concepts.
The results of our benchmarking across six proposed datasets, covering three tasks of clustering, retrieval, and reranking, highlight the need for future research on domain adaptation techniques.
arXiv Detail & Related papers (2024-11-18T20:54:17Z) - Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance [0.0]
Our work is driven by the need to manage and process large volumes of short text documents inherent in specific application domains.
By leveraging domain-specific knowledge and expertise, our approach aims to shape factual data within these domains.
Our work underscores the transformative potential of the partnership of domain-specific language models and graph-oriented databases.
arXiv Detail & Related papers (2024-10-04T19:02:09Z) - Computational Job Market Analysis with Natural Language Processing [5.117211717291377]
This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions.
We frame the problem, obtaining annotated data, and introducing extraction methodologies.
Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training.
arXiv Detail & Related papers (2024-04-29T14:52:38Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Informed Named Entity Recognition Decoding for Generative Language
Models [3.5323691899538128]
We propose Informed Named Entity Recognition Decoding (iNERD), which treats named entity recognition as a generative process.
We coarse-tune our model on a merged named entity corpus to strengthen its performance, evaluate five generative language models on eight named entity recognition datasets, and achieve remarkable results.
arXiv Detail & Related papers (2023-08-15T14:16:29Z) - Gradient Imitation Reinforcement Learning for General Low-Resource
Information Extraction [80.64518530825801]
We develop a Gradient Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data.
We also leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings.
arXiv Detail & Related papers (2022-11-11T05:37:19Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - Knowledge Graph Anchored Information-Extraction for Domain-Specific
Insights [1.6308268213252761]
We use a task-based approach for fulfilling specific information needs within a new domain.
A pipeline constructed of state of the art NLP technologies is used to automatically extract an instance level semantic structure.
arXiv Detail & Related papers (2021-04-18T19:28:10Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.