Performance Evaluation in Multimedia Retrieval
- URL: http://arxiv.org/abs/2410.06654v1
- Date: Wed, 09 Oct 2024 08:06:15 GMT
- Title: Performance Evaluation in Multimedia Retrieval
- Authors: Loris Sauter, Ralph Gasser, Heiko Schuldt, Abraham Bernstein, Luca Rossetto,
- Abstract summary: Performance evaluation in multimedia retrieval relies heavily on retrieval experiments.
These can involve human-in-the-loop and machine-only settings for the retrieval process itself and the subsequent verification of results.
We present a formal model to express all relevant aspects of such retrieval experiments, as well as a flexible open-source evaluation infrastructure.
- Score: 7.801919915773585
- License:
- Abstract: Performance evaluation in multimedia retrieval, as in the information retrieval domain at large, relies heavily on retrieval experiments, employing a broad range of techniques and metrics. These can involve human-in-the-loop and machine-only settings for the retrieval process itself and the subsequent verification of results. Such experiments can be elaborate and use-case-specific, which can make them difficult to compare or replicate. In this paper, we present a formal model to express all relevant aspects of such retrieval experiments, as well as a flexible open-source evaluation infrastructure that implements the model. These contributions intend to make a step towards lowering the hurdles for conducting retrieval experiments and improving their reproducibility.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods [0.0]
In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies.
A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs.
The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance.
arXiv Detail & Related papers (2024-09-13T02:08:47Z) - Experimental Analysis of Large-scale Learnable Vector Storage
Compression [42.52474894105165]
Learnable embedding vector is one of the most important applications in machine learning.
The high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table.
Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads.
arXiv Detail & Related papers (2023-11-27T07:11:47Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - Multi-Domain Joint Training for Person Re-Identification [51.73921349603597]
Deep learning-based person Re-IDentification (ReID) often requires a large amount of training data to achieve good performance.
It appears that collecting more training data from diverse environments tends to improve the ReID performance.
We propose an approach called Domain-Camera-Sample Dynamic network (DCSD) whose parameters can be adaptive to various factors.
arXiv Detail & Related papers (2022-01-06T09:20:59Z) - Scaling up Search Engine Audits: Practical Insights for Algorithm
Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions.
We demonstrate the successful performance of our research infrastructure across multiple data collections.
We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z) - Reproducibility Companion Paper: Knowledge Enhanced Neural Fashion Trend
Forecasting [78.046352507802]
We provide an artifact that allows the replication of the experiments using a Python implementation.
We reproduce the experiments conducted in the original paper and obtain similar performance as previously reported.
arXiv Detail & Related papers (2021-05-25T10:53:11Z) - Effective Distributed Representations for Academic Expert Search [1.9815631757151737]
We study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval.
In particular, we explore the impact of the use of contextualized embeddings on search performance.
We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations.
arXiv Detail & Related papers (2020-10-16T09:43:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.