Related papers: MUSS: Multilevel Subset Selection for Relevance and Diversity

MUSS: Multilevel Subset Selection for Relevance and Diversity

URL: http://arxiv.org/abs/2503.11126v2
Date: Tue, 20 May 2025 07:43:34 GMT
Title: MUSS: Multilevel Subset Selection for Relevance and Diversity
Authors: Vu Nguyen, Andrey Kan,
Abstract summary: In recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation.<n>We present a novel theoretical approach for analyzing this type of problems, and show that our method achieves a constant factor approximation of the optimal objective.
Score: 4.8254343133177295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. In a recommender system application, our method can not only improve the performance up to $4$ percent points in precision, but is also $20$ to $80$ times faster. Our method is also capable of outperforming baselines on RAG-based question answering accuracy. We present a novel theoretical approach for analyzing this type of problems, and show that our method achieves a constant factor approximation of the optimal objective. Moreover, our analysis also resulted in a $\times 2$ tighter bound for DGDS compared to previously known bound.

Related papers

The Multi-Query Paradox in Zeroth-Order Optimization [6.777975824808536]
Zeroth-order Alignment (ZO) provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function.
arXiv Detail & Related papers (2025-09-19T03:10:45Z)
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation [77.07879255360342]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information.<n>In RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers.<n>Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds.<n>Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality.
arXiv Detail & Related papers (2025-07-25T09:32:29Z)
SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval [40.17823569905232]
Retrieval-Augmented Generation (RAG) has greatly improved large language models (LLMs) by enabling them to generate accurate, contextually grounded responses. RAG approaches, which prioritize top-ranked documents based solely on query-context relevance, often introduce redundancy and conflicting information. We propose Selection using Matrices for Augmented Retrieval (RAG) in question answering tasks, a fully unsupervised and training-free framework designed to optimize context selection in RAG.
arXiv Detail & Related papers (2024-09-21T03:03:09Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection [50.30291173608449]
We propose a novel multi-objective meta-learning network (M$3$BS) for zero-shot hyperspectral band selection. In M$3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base. The acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning.
arXiv Detail & Related papers (2024-06-12T07:13:31Z)
Embedded Hyperspectral Band Selection with Adaptive Optimization for Image Semantic Segmentation [0.0]
This paper introduces a pioneering approach for hyperspectral band selection that offers an embedded solution.<n>Our proposed method, embedded hyperspectral band selection (EHBS), excels in selecting the best bands without needing prior processing.<n>We conduct experiments on two distinct semantic-segmentation hyperspectral benchmark datasets, demonstrating their superiority in terms of accuracy and ease of use.
arXiv Detail & Related papers (2024-01-21T07:48:39Z)
RIGA: A Regret-Based Interactive Genetic Algorithm [14.388696798649658]
We propose an interactive genetic algorithm for solving multi-objective optimization problems under preference imprecision. Our algorithm, called RIGA, can be applied to any multi-objective optimization problem provided that the aggregation function is linear in its parameters. For several performance indicators (computation times, gap to optimality and number of queries), RIGA obtains better results than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-10T13:56:15Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function. Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z)
Multi-Task Learning for Sparsity Pattern Heterogeneity: Statistical and Computational Perspectives [10.514866749547558]
We consider a problem in Multi-Task Learning (MTL) where multiple linear models are jointly trained on a collection of datasets. A key novelty of our framework is that it allows the sparsity pattern of regression coefficients and the values of non-zero coefficients to differ across tasks. Our methods encourage models to share information across tasks through separately encouraging 1) coefficient supports, and/or 2) nonzero coefficient values to be similar. This allows models to borrow strength during variable selection even when non-zero coefficient values differ across tasks.
arXiv Detail & Related papers (2022-12-16T19:52:25Z)
Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem. We show that our selection methods outperform random selection by $5-20%$. In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z)
Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark [0.0]
This review synthesizes the literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment. We consider four typical dataset scenarios that are challenging for ML models.
arXiv Detail & Related papers (2021-11-23T20:20:24Z)
Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations [16.986870945319293]
We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.
arXiv Detail & Related papers (2021-08-31T13:03:30Z)
Automatic selection of basis-adaptive sparse polynomial chaos expansions for engineering applications [0.0]
We describe three state-of-the-art basis-adaptive approaches for sparse chaos expansions. We conduct an extensive benchmark in terms of global approximation accuracy on a large set of computational models. We introduce a novel solver and basis adaptivity selection scheme guided by cross-validation error.
arXiv Detail & Related papers (2020-09-10T12:13:57Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
Sample-Rank: Weak Multi-Objective Recommendations Using Rejection Sampling [0.5156484100374059]
We introduce a method involving multi-goal sampling followed by ranking for user-relevance (Sample-Rank) to nudge recommendations towards multi-objective goals of the marketplace. The proposed method's novelty is that it reduces the MO recommendation problem to sampling from a desired multi-goal distribution then using it to build a production-friendly learning-to-rank model.
arXiv Detail & Related papers (2020-08-24T09:17:18Z)
Feature Selection Methods for Cost-Constrained Classification in Random Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees. We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z)
Supervised Hyperalignment for multi-subject fMRI data alignment [81.8694682249097]
This paper proposes a Supervised Hyperalignment (SHA) method to ensure better functional alignment for MVP analysis. Experiments on multi-subject datasets demonstrate that SHA method achieves up to 19% better performance for multi-class problems.
arXiv Detail & Related papers (2020-01-09T09:17:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.