Natural language processing for word sense disambiguation and
information extraction
- URL: http://arxiv.org/abs/2004.02256v1
- Date: Sun, 5 Apr 2020 17:13:43 GMT
- Title: Natural language processing for word sense disambiguation and
information extraction
- Authors: K. R. Chowdhary
- Abstract summary: The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research work deals with Natural Language Processing (NLP) and
extraction of essential information in an explicit form. The most common among
the information management strategies is Document Retrieval (DR) and
Information Filtering. DR systems may work as combine harvesters, which bring
back useful material from the vast fields of raw material. With large amount of
potentially useful information in hand, an Information Extraction (IE) system
can then transform the raw material by refining and reducing it to a germ of
original text. A Document Retrieval system collects the relevant documents
carrying the required information, from the repository of texts. An IE system
then transforms them into information that is more readily digested and
analyzed. It isolates relevant text fragments, extracts relevant information
from the fragments, and then arranges together the targeted information in a
coherent framework. The thesis presents a new approach for Word Sense
Disambiguation using thesaurus. The illustrative examples supports the
effectiveness of this approach for speedy and effective disambiguation. A
Document Retrieval method, based on Fuzzy Logic has been described and its
application is illustrated. A question-answering system describes the operation
of information extraction from the retrieved text documents. The process of
information extraction for answering a query is considerably simplified by
using a Structured Description Language (SDL) which is based on cardinals of
queries in the form of who, what, when, where and why. The thesis concludes
with the presentation of a novel strategy based on Dempster-Shafer theory of
evidential reasoning, for document retrieval and information extraction. This
strategy permits relaxation of many limitations, which are inherent in Bayesian
probabilistic approach.
Related papers
- Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)
DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.
Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z) - Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems [0.20971479389679337]
This study introduces a system leveraging Large Language Models (LLMs) to extract text from PDF documents via a conversational interface.
The system provides informative responses to user inquiries while highlighting relevant passages within the PDF.
The proposed system gives competitive ROUGE values as compared to existing state-of-the-art techniques for text extraction and summarization.
arXiv Detail & Related papers (2025-01-16T19:12:25Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Enhanced document retrieval with topic embeddings [0.0]
Document retrieval systems have experienced a revitalized interest with the advent of retrieval-augmented generation (RAG)
RAG architecture offers a lower hallucination rate than LLM-only applications.
We have devised a new vectorization method that takes into account the topic information of the document.
arXiv Detail & Related papers (2024-08-19T22:01:45Z) - Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup.
The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus.
The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs [11.63334863772068]
Multi-document grounded dialogue systems (DGDS) answer users' requests by finding supporting knowledge from a collection of documents.
This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework.
arXiv Detail & Related papers (2023-02-23T08:28:29Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.