Automatic Construction of Enterprise Knowledge Base
- URL: http://arxiv.org/abs/2106.15085v1
- Date: Tue, 29 Jun 2021 04:29:02 GMT
- Title: Automatic Construction of Enterprise Knowledge Base
- Authors: Junyi Chai, Yujie He, Homa Hashemi, Bing Li, Daraksha Parveen,
Ranganath Kondapally, Wenjin Xu
- Abstract summary: We present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention.
This system is currently serving as part of a Microsoft 365 service.
- Score: 6.6421796160706945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an automatic knowledge base construction system
from large scale enterprise documents with minimal efforts of human
intervention. In the design and deployment of such a knowledge mining system
for enterprise, we faced several challenges including data distributional
shift, performance evaluation, compliance requirements and other practical
issues. We leveraged state-of-the-art deep learning models to extract
information (named entities and definitions) at per document level, then
further applied classical machine learning techniques to process global
statistical information to improve the knowledge base. Experimental results are
reported on actual enterprise documents. This system is currently serving as
part of a Microsoft 365 service.
Related papers
- Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review [51.61531917413708]
Deep learning-based approaches for Key Information Extraction have been proposed under the umbrella term Document Understanding.
The goal of this systematic literature review is an in-depth analysis of existing approaches in this domain and the identification of opportunities for further research.
arXiv Detail & Related papers (2024-07-23T08:15:55Z) - Knowledge Sharing in Manufacturing using Large Language Models: User
Evaluation and Model Benchmarking [7.976952274443561]
Large Language Model (LLM)-based system designed to retrieve information from factory documentation and knowledge shared by expert operators.
System aims to efficiently answer queries from operators and facilitate the sharing of new knowledge.
arXiv Detail & Related papers (2024-01-10T14:53:18Z) - Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z) - Model-Driven Engineering Method to Support the Formalization of Machine
Learning using SysML [0.0]
This work introduces a method supporting the collaborative definition of machine learning tasks by leveraging model-based engineering.
The method supports the identification and integration of various data sources, the required definition of semantic connections between data attributes, and the definition of data processing steps.
arXiv Detail & Related papers (2023-07-10T11:33:46Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - Embedding Knowledge for Document Summarization: A Survey [66.76415502727802]
Previous works proved that knowledge-embedded document summarizers excel at generating superior digests.
We propose novel to recapitulate knowledge and knowledge embeddings under the document summarization view.
arXiv Detail & Related papers (2022-04-24T04:36:07Z) - Deep Learning for Technical Document Classification [6.787004826008753]
This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification.
The trained model can potentially be scaled to millions of real-world technical documents with both text and figures.
arXiv Detail & Related papers (2021-06-27T16:12:47Z) - A Survey of Deep Learning Approaches for OCR and Document Understanding [68.65995739708525]
We review different techniques for document understanding for documents written in English.
We consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
arXiv Detail & Related papers (2020-11-27T03:05:59Z) - Towards CRISP-ML(Q): A Machine Learning Process Model with Quality
Assurance Methodology [53.063411515511056]
We propose a process model for the development of machine learning applications.
The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project.
The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications.
arXiv Detail & Related papers (2020-03-11T08:25:49Z) - Documentation of Machine Learning Software [7.154621689269006]
Machine learning software documentation is different from most of the documentations that were studied in software engineering research.
Our ultimate goal is automated generation and adaptation of machine learning software documents for users with different levels of expertise.
We will investigate the Stack Overflow Q/As and classify the documentation related Q/As within the machine learning domain.
arXiv Detail & Related papers (2020-01-30T00:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.