Related papers: Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

URL: http://arxiv.org/abs/2308.00762v1
Date: Tue, 1 Aug 2023 18:01:21 GMT
Title: Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval
Authors: Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Armin Toroghi, Anton Korikov, Ali Pesaranghader, Touqir Sajed, Manasa Bharadwaj, Borislav Mavrin, and Scott Sanner
Abstract summary: We extend Neural Information Retrieval (IR) methods for matching queries to documents to the task of reviewing items. We use self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings.
Score: 12.850360384298712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As natural language interfaces enable users to express increasingly complex natural language queries, there is a parallel explosion of user review content that can allow users to better find items such as restaurants, books, or movies that match these expressive queries. While Neural Information Retrieval (IR) methods have provided state-of-the-art results for matching queries to documents, they have not been extended to the task of Reviewed-Item Retrieval (RIR), where query-review scores must be aggregated (or fused) into item-level scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR methodology to RIR by leveraging self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. Specifically, contrastive learning requires a choice of positive and negative samples, where the unique two-level structure of our item-review data combined with meta-data affords us a rich structure for the selection of these samples. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating, selection of hard positive samples by choosing the least similar reviews from the same anchor item, and selection of hard negative samples by choosing the most similar reviews from different items. We also explore anchor sub-sampling and augmenting with meta-data. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings. Experimental results show that Late Fusion contrastive learning for Neural RIR outperforms all other contrastive IR configurations, Neural IR, and sparse retrieval baselines, thus demonstrating the power of exploiting the two-level structure in Neural RIR approaches as well as the importance of preserving the nuance of individual review content via Late Fusion methods.

Related papers

MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation [8.950307082012763]
Retrieval-Augmented Generation (RAG) has gained prominence as an effective method for enhancing the generative capabilities of Large Language Models (LLMs) We present MIRAGE, a Question Answering dataset specifically designed for RAG evaluation. MIRAGE consists of 7,560 curated instances mapped to a retrieval pool of 37,800 entries, enabling an efficient and precise evaluation of both retrieval and generation tasks.
arXiv Detail & Related papers (2025-04-23T23:05:46Z)
A Comprehensive Survey on Composed Image Retrieval [54.54527281731775]
Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query. There is currently no comprehensive review of CIR to provide a timely overview of this field. We synthesize insights from over 120 publications in top conferences and journals, including ACM TOIS, SIGIR, and CVPR.
arXiv Detail & Related papers (2025-02-19T01:37:24Z)
Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments. We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques. Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z)
Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion [15.630734768499826]
We propose several novel aspect fusion strategies to address natural language product queries. For imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.
arXiv Detail & Related papers (2024-08-01T19:04:10Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs) Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z)
Topology-aware Debiased Self-supervised Graph Learning for Recommendation [6.893289671937124]
We propose Topology-aware De Self-supervised Graph Learning ( TDSGL) for recommendation. TDSGL constructs contrastive pairs according to the semantic similarity between users (items) Our results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets.
arXiv Detail & Related papers (2023-10-24T14:16:19Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item Recommendation [48.1799451277808]
We propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation. We first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review. Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels.
arXiv Detail & Related papers (2021-08-18T08:04:38Z)
SelfCF: A Simple Framework for Self-supervised Collaborative Filtering [72.68215241599509]
Collaborative filtering (CF) is widely used to learn informative latent representations of users and items from observed interactions. We propose a self-supervised collaborative filtering framework (SelfCF) that is specially designed for recommender scenario with implicit feedback. We show that SelfCF can boost up the accuracy by up to 17.79% on average, compared with a self-supervised framework BUIR.
arXiv Detail & Related papers (2021-07-07T05:21:12Z)
Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions. We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy. GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.