Towards Understanding Domain Adapted Sentence Embeddings for Document Retrieval
- URL: http://arxiv.org/abs/2406.12336v2
- Date: Mon, 02 Dec 2024 04:08:49 GMT
- Title: Towards Understanding Domain Adapted Sentence Embeddings for Document Retrieval
- Authors: Sujoy Roychowdhury, Sumit Soman, H. G. Ranjani, Vansh Chhabra, Neeraj Gunda, Shashank Gautam, Subhadip Bandyopadhyay, Sai Krishna Bala,
- Abstract summary: We domain adapt embeddings using telecom, health and science datasets for question answering.
We establish a systematic method to obtain thresholds for similarity scores for different embeddings.
We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones.
- Score: 11.695672855244744
- License:
- Abstract: A plethora of sentence embedding models makes it challenging to choose one, especially for technical domains rich with specialized vocabulary. In this work, we domain adapt embeddings using telecom, health and science datasets for question answering. We evaluate embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies, as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. As expected, we observe that fine-tuning improves mean bootstrapped accuracies. We also observe that it results in tighter confidence intervals, which further improve when pre-training is preceded by fine-tuning. We introduce metrics which measure the distributional overlaps of top-$K$, correct and random document similarities with the question. Further, we show that these metrics are correlated with retrieval accuracy and similarity thresholds. Recent literature shows conflicting effects of isotropy on retrieval accuracies. Our experiments establish that the isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) is poorly correlated with retrieval performance. We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones, and fine-tuning moves them further apart. Based on our results, we provide recommendations for use of our methodology and metrics by researchers and practitioners.
Related papers
- Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach [7.1606014219358425]
We present an approach for evaluating attribution maps, which play a central role in interpreting predictions of convolutional neural networks (CNNGrads)
We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking.
Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework.
arXiv Detail & Related papers (2024-11-22T13:57:56Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Training BERT Models to Carry Over a Coding System Developed on One Corpus to Another [0.0]
This paper describes how we train BERT models to carry over a coding system developed on the paragraphs of a Hungarian literary journal to another.
The aim of the coding system is to track trends in the perception of literary translation around the political transformation in 1989 in Hungary.
arXiv Detail & Related papers (2023-08-07T17:46:49Z) - PromptORE -- A Novel Approach Towards Fully Unsupervised Relation
Extraction [0.0]
Unsupervised Relation Extraction (RE) aims to identify relations between entities in text, without having access to labeled data during training.
We propose PromptORE, a ''Prompt-based Open Relation Extraction'' model.
We adapt the novel prompt-tuning paradigm to work in an unsupervised setting, and use it to embed sentences expressing a relation.
We show that PromptORE consistently outperforms state-of-the-art models with a relative gain of more than 40% in B 3, V-measure and ARI.
arXiv Detail & Related papers (2023-03-24T12:55:35Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Birds of a Feather Trust Together: Knowing When to Trust a Classifier
via Adaptive Neighborhood Aggregation [30.34223543030105]
We show how NeighborAgg can leverage the two essential information via an adaptive neighborhood aggregation.
We also extend our approach to the closely related task of mislabel detection and provide a theoretical coverage guarantee to bound the false negative.
arXiv Detail & Related papers (2022-11-29T18:43:15Z) - Assaying Out-Of-Distribution Generalization in Transfer Learning [103.57862972967273]
We take a unified view of previous work, highlighting message discrepancies that we address empirically.
We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting.
arXiv Detail & Related papers (2022-07-19T12:52:33Z) - Are Missing Links Predictable? An Inferential Benchmark for Knowledge
Graph Completion [79.07695173192472]
InferWiki improves upon existing benchmarks in inferential ability, assumptions, and patterns.
Each testing sample is predictable with supportive data in the training set.
In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets.
arXiv Detail & Related papers (2021-08-03T09:51:15Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.