Natural language technology and query expansion: issues,
state-of-the-art and perspectives
- URL: http://arxiv.org/abs/2004.11093v1
- Date: Thu, 23 Apr 2020 11:39:07 GMT
- Title: Natural language technology and query expansion: issues,
state-of-the-art and perspectives
- Authors: Bhawani Selvaretnam, Mohammed Belkhatir
- Abstract summary: Linguistic characteristics that cause ambiguity and misinterpretation of queries as well as additional factors affect the users ability to accurately represent their information needs.
We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition.
For each of the modules we review the state-of-the-art solutions in the literature and categorized under the light of the techniques used.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The availability of an abundance of knowledge sources has spurred a large
amount of effort in the development and enhancement of Information Retrieval
techniques. Users information needs are expressed in natural language and
successful retrieval is very much dependent on the effective communication of
the intended purpose. Natural language queries consist of multiple linguistic
features which serve to represent the intended search goal. Linguistic
characteristics that cause semantic ambiguity and misinterpretation of queries
as well as additional factors such as the lack of familiarity with the search
environment affect the users ability to accurately represent their information
needs, coined by the concept intention gap. The latter directly affects the
relevance of the returned search results which may not be to the users
satisfaction and therefore is a major issue impacting the effectiveness of
information retrieval systems. Central to our discussion is the identification
of the significant constituents that characterize the query intent and their
enrichment through the addition of meaningful terms, phrases or even latent
representations, either manually or automatically to capture their intended
meaning. Specifically, we discuss techniques to achieve the enrichment and in
particular those utilizing the information gathered from statistical processing
of term dependencies within a document corpus or from external knowledge
sources such as ontologies. We lay down the anatomy of a generic linguistic
based query expansion framework and propose its module-based decomposition,
covering topical issues from query processing, information retrieval,
computational linguistics and ontology engineering. For each of the modules we
review state-of-the-art solutions in the literature categorized and analyzed
under the light of the techniques used.
Related papers
- Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection [0.0]
The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability.
This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are.
We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
arXiv Detail & Related papers (2024-01-29T16:42:34Z) - Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Ontology-based Feature Selection: A Survey [0.6767885381740952]
Survey aims to provide insight into key aspects of knowledge extraction from text, images, databases and expertise.
presented examples span diverse application domains, e.g., medicine, tourism, mechanical and civil engineering.
arXiv Detail & Related papers (2021-04-15T19:03:31Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - On the Combined Use of Extrinsic Semantic Resources for Medical
Information Search [0.0]
We develop a framework to highlight and expand head medical concepts in verbose medical queries.
We also build semantically enhanced inverted index documents.
To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF 2014 dataset.
arXiv Detail & Related papers (2020-05-17T14:18:04Z) - Coupled intrinsic and extrinsic human language resource-based query
expansion [0.0]
We present here a query expansion framework which capitalizes on both linguistic characteristics for query constituent encoding, expansion concept extraction and concept weighting.
A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
arXiv Detail & Related papers (2020-04-23T11:22:38Z) - Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted.
We formalize the application of these patterns by considering a query conceptual representation.
A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.