Related papers: Natural language processing for word sense disambiguation and information extraction

Natural language processing for word sense disambiguation and information extraction

URL: http://arxiv.org/abs/2004.02256v1
Date: Sun, 5 Apr 2020 17:13:43 GMT
Title: Natural language processing for word sense disambiguation and information extraction
Authors: K. R. Chowdhary
Abstract summary: The thesis presents a new approach for Word Sense Disambiguation using thesaurus. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.

Related papers

DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER) DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process. Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems [0.20971479389679337]
This study introduces a system leveraging Large Language Models (LLMs) to extract text from PDF documents via a conversational interface. The system provides informative responses to user inquiries while highlighting relevant passages within the PDF. The proposed system gives competitive ROUGE values as compared to existing state-of-the-art techniques for text extraction and summarization.
arXiv Detail & Related papers (2025-01-16T19:12:25Z)
GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Enhanced document retrieval with topic embeddings [0.0]
Document retrieval systems have experienced a revitalized interest with the advent of retrieval-augmented generation (RAG) RAG architecture offers a lower hallucination rate than LLM-only applications. We have devised a new vectorization method that takes into account the topic information of the document.
arXiv Detail & Related papers (2024-08-19T22:01:45Z)
Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup. The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus. The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs [11.63334863772068]
Multi-document grounded dialogue systems (DGDS) answer users' requests by finding supporting knowledge from a collection of documents. This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework.
arXiv Detail & Related papers (2023-02-23T08:28:29Z)
MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles [0.0]
We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
arXiv Detail & Related papers (2022-12-11T06:49:29Z)
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents. Text reading and information extraction can reinforce each other via a well-designed multi-modal context block. The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z)
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z)
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network. multimodal visual and textual features of text reading are fused for information extraction. Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.