MUSS: Multilevel Subset Selection for Relevance and Diversity
- URL: http://arxiv.org/abs/2503.11126v1
- Date: Fri, 14 Mar 2025 06:37:17 GMT
- Title: MUSS: Multilevel Subset Selection for Relevance and Diversity
- Authors: Vu Nguyen, Andrey Kan,
- Abstract summary: In recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation.<n>We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection.<n>Our method can achieve the same level of performance as baselines, but 4.5 to 20 times faster.
- Score: 4.8254343133177295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. We provide a rigorous theoretical analysis and show that our method achieves a constant factor approximation of the optimal objective. In a recommender system application, our method can achieve the same level of performance as baselines, but 4.5 to 20 times faster. Our method is also capable of outperforming baselines by up to 6 percent points of RAG-based question answering accuracy.
Related papers
- SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval [40.17823569905232]
Retrieval-Augmented Generation (RAG) has greatly improved large language models (LLMs) by enabling them to generate accurate, contextually grounded responses.
RAG approaches, which prioritize top-ranked documents based solely on query-context relevance, often introduce redundancy and conflicting information.
We propose Selection using Matrices for Augmented Retrieval (RAG) in question answering tasks, a fully unsupervised and training-free framework designed to optimize context selection in RAG.
arXiv Detail & Related papers (2024-09-21T03:03:09Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection [50.30291173608449]
We propose a novel multi-objective meta-learning network (M$3$BS) for zero-shot hyperspectral band selection.
In M$3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base.
The acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning.
arXiv Detail & Related papers (2024-06-12T07:13:31Z) - Embedded Hyperspectral Band Selection with Adaptive Optimization for Image Semantic Segmentation [0.0]
This paper introduces a pioneering approach for hyperspectral band selection that offers an embedded solution.<n>Our proposed method, embedded hyperspectral band selection (EHBS), excels in selecting the best bands without needing prior processing.<n>We conduct experiments on two distinct semantic-segmentation hyperspectral benchmark datasets, demonstrating their superiority in terms of accuracy and ease of use.
arXiv Detail & Related papers (2024-01-21T07:48:39Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Multi-Task Learning for Sparsity Pattern Heterogeneity: Statistical and Computational Perspectives [10.514866749547558]
We consider a problem in Multi-Task Learning (MTL) where multiple linear models are jointly trained on a collection of datasets.
A key novelty of our framework is that it allows the sparsity pattern of regression coefficients and the values of non-zero coefficients to differ across tasks.
Our methods encourage models to share information across tasks through separately encouraging 1) coefficient supports, and/or 2) nonzero coefficient values to be similar.
This allows models to borrow strength during variable selection even when non-zero coefficient values differ across tasks.
arXiv Detail & Related papers (2022-12-16T19:52:25Z) - Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem.
We show that our selection methods outperform random selection by $5-20%$.
In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z) - Filter Methods for Feature Selection in Supervised Machine Learning
Applications -- Review and Benchmark [0.0]
This review synthesizes the literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment.
We consider four typical dataset scenarios that are challenging for ML models.
arXiv Detail & Related papers (2021-11-23T20:20:24Z) - Max-Utility Based Arm Selection Strategy For Sequential Query
Recommendations [16.986870945319293]
We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics.
The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms.
We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.
arXiv Detail & Related papers (2021-08-31T13:03:30Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z) - Sample-Rank: Weak Multi-Objective Recommendations Using Rejection
Sampling [0.5156484100374059]
We introduce a method involving multi-goal sampling followed by ranking for user-relevance (Sample-Rank) to nudge recommendations towards multi-objective goals of the marketplace.
The proposed method's novelty is that it reduces the MO recommendation problem to sampling from a desired multi-goal distribution then using it to build a production-friendly learning-to-rank model.
arXiv Detail & Related papers (2020-08-24T09:17:18Z) - Supervised Hyperalignment for multi-subject fMRI data alignment [81.8694682249097]
This paper proposes a Supervised Hyperalignment (SHA) method to ensure better functional alignment for MVP analysis.
Experiments on multi-subject datasets demonstrate that SHA method achieves up to 19% better performance for multi-class problems.
arXiv Detail & Related papers (2020-01-09T09:17:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.