A Proposed Conceptual Framework for a Representational Approach to
Information Retrieval
- URL: http://arxiv.org/abs/2110.01529v1
- Date: Mon, 4 Oct 2021 15:57:02 GMT
- Title: A Proposed Conceptual Framework for a Representational Approach to
Information Retrieval
- Authors: Jimmy Lin
- Abstract summary: This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing.
I propose a representational approach that breaks the core text retrieval problem into a logical scoring model and a physical retrieval model.
- Score: 42.67826268399347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper outlines a conceptual framework for understanding recent
developments in information retrieval and natural language processing that
attempts to integrate dense and sparse retrieval methods. I propose a
representational approach that breaks the core text retrieval problem into a
logical scoring model and a physical retrieval model. The scoring model is
defined in terms of encoders, which map queries and documents into a
representational space, and a comparison function that computes query-document
scores. The physical retrieval model defines how a system produces the top-k
scoring documents from an arbitrarily large corpus with respect to a query. The
scoring model can be further analyzed along two dimensions: dense vs. sparse
representations and supervised (learned) vs. unsupervised approaches. I show
that many recently proposed retrieval methods, including multi-stage ranking
designs, can be seen as different parameterizations in this framework, and that
a unified view suggests a number of open research questions, providing a
roadmap for future work. As a bonus, this conceptual framework establishes
connections to sentence similarity tasks in natural language processing and
information access "technologies" prior to the dawn of computing.
Related papers
- Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - CODER: An efficient framework for improving retrieval through
COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost.
It utilizes precomputed document representations extracted by a base dense retrieval method.
It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z) - Value Retrieval with Arbitrary Queries for Form-like Documents [50.5532781148902]
We propose value retrieval with arbitrary queries for form-like documents.
Our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form.
We propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training.
arXiv Detail & Related papers (2021-12-15T01:12:02Z) - Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted.
We formalize the application of these patterns by considering a query conceptual representation.
A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z) - Message Passing Query Embedding [4.035753155957698]
We propose a graph neural network to encode a graph representation of a query.
We show that the model learns entity embeddings that capture the notion of entity type without explicit supervision.
arXiv Detail & Related papers (2020-02-06T17:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.