Natural language processing for word sense disambiguation and
information extraction
- URL: http://arxiv.org/abs/2004.02256v1
- Date: Sun, 5 Apr 2020 17:13:43 GMT
- Title: Natural language processing for word sense disambiguation and
information extraction
- Authors: K. R. Chowdhary
- Abstract summary: The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research work deals with Natural Language Processing (NLP) and
extraction of essential information in an explicit form. The most common among
the information management strategies is Document Retrieval (DR) and
Information Filtering. DR systems may work as combine harvesters, which bring
back useful material from the vast fields of raw material. With large amount of
potentially useful information in hand, an Information Extraction (IE) system
can then transform the raw material by refining and reducing it to a germ of
original text. A Document Retrieval system collects the relevant documents
carrying the required information, from the repository of texts. An IE system
then transforms them into information that is more readily digested and
analyzed. It isolates relevant text fragments, extracts relevant information
from the fragments, and then arranges together the targeted information in a
coherent framework. The thesis presents a new approach for Word Sense
Disambiguation using thesaurus. The illustrative examples supports the
effectiveness of this approach for speedy and effective disambiguation. A
Document Retrieval method, based on Fuzzy Logic has been described and its
application is illustrated. A question-answering system describes the operation
of information extraction from the retrieved text documents. The process of
information extraction for answering a query is considerably simplified by
using a Structured Description Language (SDL) which is based on cardinals of
queries in the form of who, what, when, where and why. The thesis concludes
with the presentation of a novel strategy based on Dempster-Shafer theory of
evidential reasoning, for document retrieval and information extraction. This
strategy permits relaxation of many limitations, which are inherent in Bayesian
probabilistic approach.
Related papers
- Dense X Retrieval: What Retrieval Granularity Should We Use? [59.359325855708974]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid.
Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Expository Text Generation: Imitate, Retrieve, Paraphrase [26.43857184008374]
We propose the task of expository text generation, which seeks to automatically generate an accurate and stylistically consistent text for a topic.
We develop IRP, a framework that overcomes the limitations of retrieval-augmented models and iteratively performs content planning, fact retrieval, and rephrasing.
We show that IRP produces factual and organized expository texts that accurately inform readers.
arXiv Detail & Related papers (2023-05-05T04:26:29Z) - Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs [11.63334863772068]
Multi-document grounded dialogue systems (DGDS) answer users' requests by finding supporting knowledge from a collection of documents.
This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework.
arXiv Detail & Related papers (2023-02-23T08:28:29Z) - MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles [0.0]
We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary.
We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
arXiv Detail & Related papers (2022-12-11T06:49:29Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.