BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
- URL: http://arxiv.org/abs/2411.12235v1
- Date: Tue, 19 Nov 2024 05:19:53 GMT
- Title: BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
- Authors: Zongmeng Zhang, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang, Houqiang Li,
- Abstract summary: We first investigate whether current retrieval systems can comprehend the Boolean logic implied in language.
Through extensive experimental results, we draw the conclusion that current dense retrieval systems do not fully understand Boolean logic in language.
We propose a contrastive continual training method that serves as a strong baseline for the research community.
- Score: 88.29075896295357
- License:
- Abstract: Dense retrieval, which aims to encode the semantic information of arbitrary text into dense vector representations or embeddings, has emerged as an effective and efficient paradigm for text retrieval, consequently becoming an essential component in various natural language processing systems. These systems typically focus on optimizing the embedding space by attending to the relevance of text pairs, while overlooking the Boolean logic inherent in language, which may not be captured by current training objectives. In this work, we first investigate whether current retrieval systems can comprehend the Boolean logic implied in language. To answer this question, we formulate the task of Boolean Dense Retrieval and collect a benchmark dataset, BoolQuestions, which covers complex queries containing basic Boolean logic and corresponding annotated passages. Through extensive experimental results on the proposed task and benchmark dataset, we draw the conclusion that current dense retrieval systems do not fully understand Boolean logic in language, and there is a long way to go to improve our dense retrieval systems. Furthermore, to promote further research on enhancing the understanding of Boolean logic for language models, we explore Boolean operation on decomposed query and propose a contrastive continual training method that serves as a strong baseline for the research community.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers [18.279429202248632]
We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using language explanations.
DISCERN iteratively generates precise natural language descriptions of systematic errors by employing an interactive loop between two large language models.
We show that users can interpret systematic biases more effectively (by over 25% relative) and efficiently when described through language explanations as opposed to cluster exemplars.
arXiv Detail & Related papers (2024-10-29T17:04:55Z) - BOLD: Boolean Logic Deep Learning [1.4272256806865107]
We introduce the notion of Boolean variation such that neurons made of Boolean weights and inputs can be trained efficiently in Boolean domain using Boolean logic instead of descent gradient and real arithmetic.
Our approach achieves baseline full-precision accuracy in ImageNet classification and surpasses state-of-the-art results in semantic segmentation.
It significantly reduces energy consumption during both training and inference.
arXiv Detail & Related papers (2024-05-25T19:50:23Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic.
To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z) - Logic-Driven Context Extension and Data Augmentation for Logical
Reasoning of Text [65.24325614642223]
We propose to understand logical symbols and expressions in the text to arrive at the answer.
Based on such logical information, we put forward a context extension framework and a data augmentation algorithm.
Our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy.
arXiv Detail & Related papers (2021-05-08T10:09:36Z) - LogicalFactChecker: Leveraging Logical Operations for Fact Checking with
Graph Module Network [111.24773949467567]
We propose LogicalFactChecker, a neural network approach capable of leveraging logical operations for fact checking.
It achieves the state-of-the-art performance on TABFACT, a large-scale, benchmark dataset.
arXiv Detail & Related papers (2020-04-28T17:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.