Self-Supervised Contrastive BERT Fine-tuning for Fusion-based
Reviewed-Item Retrieval
- URL: http://arxiv.org/abs/2308.00762v1
- Date: Tue, 1 Aug 2023 18:01:21 GMT
- Title: Self-Supervised Contrastive BERT Fine-tuning for Fusion-based
Reviewed-Item Retrieval
- Authors: Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Armin Toroghi, Anton
Korikov, Ali Pesaranghader, Touqir Sajed, Manasa Bharadwaj, Borislav Mavrin,
and Scott Sanner
- Abstract summary: We extend Neural Information Retrieval (IR) methods for matching queries to documents to the task of reviewing items.
We use self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews.
For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating.
For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings.
- Score: 12.850360384298712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As natural language interfaces enable users to express increasingly complex
natural language queries, there is a parallel explosion of user review content
that can allow users to better find items such as restaurants, books, or movies
that match these expressive queries. While Neural Information Retrieval (IR)
methods have provided state-of-the-art results for matching queries to
documents, they have not been extended to the task of Reviewed-Item Retrieval
(RIR), where query-review scores must be aggregated (or fused) into item-level
scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR
methodology to RIR by leveraging self-supervised methods for contrastive
learning of BERT embeddings for both queries and reviews. Specifically,
contrastive learning requires a choice of positive and negative samples, where
the unique two-level structure of our item-review data combined with meta-data
affords us a rich structure for the selection of these samples. For contrastive
learning in a Late Fusion scenario, we investigate the use of positive review
samples from the same item and/or with the same rating, selection of hard
positive samples by choosing the least similar reviews from the same anchor
item, and selection of hard negative samples by choosing the most similar
reviews from different items. We also explore anchor sub-sampling and
augmenting with meta-data. For a more end-to-end Early Fusion approach, we
introduce contrastive item embedding learning to fuse reviews into single item
embeddings. Experimental results show that Late Fusion contrastive learning for
Neural RIR outperforms all other contrastive IR configurations, Neural IR, and
sparse retrieval baselines, thus demonstrating the power of exploiting the
two-level structure in Neural RIR approaches as well as the importance of
preserving the nuance of individual review content via Late Fusion methods.
Related papers
- Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments.
We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques.
Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z) - Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion [15.630734768499826]
We propose several novel aspect fusion strategies to address natural language product queries.
For imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.
arXiv Detail & Related papers (2024-08-01T19:04:10Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Topology-aware Debiased Self-supervised Graph Learning for
Recommendation [6.893289671937124]
We propose Topology-aware De Self-supervised Graph Learning ( TDSGL) for recommendation.
TDSGL constructs contrastive pairs according to the semantic similarity between users (items)
Our results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets.
arXiv Detail & Related papers (2023-10-24T14:16:19Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item
Recommendation [48.1799451277808]
We propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation.
We first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review.
Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels.
arXiv Detail & Related papers (2021-08-18T08:04:38Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.