Towards Proactive Information Retrieval in Noisy Text with Wikipedia
Concepts
- URL: http://arxiv.org/abs/2210.09877v1
- Date: Tue, 18 Oct 2022 14:12:06 GMT
- Title: Towards Proactive Information Retrieval in Noisy Text with Wikipedia
Concepts
- Authors: Tabish Ahmed and Sahan Bulathwela
- Abstract summary: This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text.
Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts.
We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.
- Score: 6.744385328015561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting useful information from the user history to clearly understand
informational needs is a crucial feature of a proactive information retrieval
system. Regarding understanding information and relevance, Wikipedia can
provide the background knowledge that an intelligent system needs. This work
explores how exploiting the context of a query using Wikipedia concepts can
improve proactive information retrieval on noisy text. We formulate two models
that use entity linking to associate Wikipedia topics with the relevance model.
Our experiments around a podcast segment retrieval task demonstrate that there
is a clear signal of relevance in Wikipedia concepts while a ranking model can
improve precision by incorporating them. We also find Wikifying the background
context of a query can help disambiguate the meaning of the query, further
helping proactive information retrieval.
Related papers
- EchoSight: Advancing Visual-Language Models with Wiki Knowledge [39.02148880719576]
We introduce EchoSight, a novel framework for knowledge-based Visual Question Answering.
To strive for high-performing retrieval, EchoSight first searches wiki articles by using visual-only information.
Our experimental results on the Encyclopedic VQA and InfoSeek datasets demonstrate that EchoSight establishes new state-of-the-art results in knowledge-based VQA.
arXiv Detail & Related papers (2024-07-17T16:55:42Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Curious Rhythms: Temporal Regularities of Wikipedia Consumption [15.686850035802667]
We show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities.
We investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns.
arXiv Detail & Related papers (2023-05-16T14:48:08Z) - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation [61.08389704326803]
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes.
Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates.
We propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability.
arXiv Detail & Related papers (2023-03-28T08:00:46Z) - Data-Efficient Autoregressive Document Retrieval for Fact Verification [7.935530801269922]
This paper introduces a distant-supervision method that does not require any annotation to train autoregressive retrievers.
We show that with task-specific supervised fine-tuning, autoregressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision.
arXiv Detail & Related papers (2022-11-17T07:27:50Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Bayesian Framework for Information-Theoretic Probing [51.98576673620385]
We argue that probing should be seen as approximating a mutual information.
This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences.
This paper proposes a new framework to measure what we term Bayesian mutual information.
arXiv Detail & Related papers (2021-09-08T18:08:36Z) - Supporting search engines with knowledge and context [1.0152838128195467]
In the first part of this thesis, we study how to make structured knowledge more accessible to the user.
In the second part of this thesis, we study how to improve interactive knowledge gathering.
In the final part of this thesis, we focus on search engine support for professional writers in the news domain.
arXiv Detail & Related papers (2021-02-12T20:28:25Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Natural language processing for word sense disambiguation and
information extraction [0.0]
The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
arXiv Detail & Related papers (2020-04-05T17:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.