Natural language processing for word sense disambiguation and
information extraction
- URL: http://arxiv.org/abs/2004.02256v1
- Date: Sun, 5 Apr 2020 17:13:43 GMT
- Title: Natural language processing for word sense disambiguation and
information extraction
- Authors: K. R. Chowdhary
- Abstract summary: The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research work deals with Natural Language Processing (NLP) and
extraction of essential information in an explicit form. The most common among
the information management strategies is Document Retrieval (DR) and
Information Filtering. DR systems may work as combine harvesters, which bring
back useful material from the vast fields of raw material. With large amount of
potentially useful information in hand, an Information Extraction (IE) system
can then transform the raw material by refining and reducing it to a germ of
original text. A Document Retrieval system collects the relevant documents
carrying the required information, from the repository of texts. An IE system
then transforms them into information that is more readily digested and
analyzed. It isolates relevant text fragments, extracts relevant information
from the fragments, and then arranges together the targeted information in a
coherent framework. The thesis presents a new approach for Word Sense
Disambiguation using thesaurus. The illustrative examples supports the
effectiveness of this approach for speedy and effective disambiguation. A
Document Retrieval method, based on Fuzzy Logic has been described and its
application is illustrated. A question-answering system describes the operation
of information extraction from the retrieved text documents. The process of
information extraction for answering a query is considerably simplified by
using a Structured Description Language (SDL) which is based on cardinals of
queries in the form of who, what, when, where and why. The thesis concludes
with the presentation of a novel strategy based on Dempster-Shafer theory of
evidential reasoning, for document retrieval and information extraction. This
strategy permits relaxation of many limitations, which are inherent in Bayesian
probabilistic approach.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Enhanced document retrieval with topic embeddings [0.0]
Document retrieval systems have experienced a revitalized interest with the advent of retrieval-augmented generation (RAG)
RAG architecture offers a lower hallucination rate than LLM-only applications.
We have devised a new vectorization method that takes into account the topic information of the document.
arXiv Detail & Related papers (2024-08-19T22:01:45Z) - Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup.
The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus.
The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs [11.63334863772068]
Multi-document grounded dialogue systems (DGDS) answer users' requests by finding supporting knowledge from a collection of documents.
This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework.
arXiv Detail & Related papers (2023-02-23T08:28:29Z) - MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles [0.0]
We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary.
We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
arXiv Detail & Related papers (2022-12-11T06:49:29Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.