Enhancing Neural Subset Selection: Integrating Background Information into Set Representations
- URL: http://arxiv.org/abs/2402.03139v2
- Date: Sun, 9 Jun 2024 07:34:45 GMT
- Title: Enhancing Neural Subset Selection: Integrating Background Information into Set Representations
- Authors: Binghui Xie, Yatao Bian, Kaiwen zhou, Yongqiang Chen, Peilin Zhao, Bo Han, Wei Meng, James Cheng,
- Abstract summary: We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest.
This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
- Score: 53.15923939406772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning neural subset selection tasks, such as compound selection in AI-aided drug discovery, have become increasingly pivotal across diverse applications. The existing methodologies in the field primarily concentrate on constructing models that capture the relationship between utility function values and subsets within their respective supersets. However, these approaches tend to overlook the valuable information contained within the superset when utilizing neural networks to model set functions. In this work, we address this oversight by adopting a probabilistic perspective. Our theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an \textit{invariant sufficient statistic} of the superset into the subset of interest for effective learning. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated. Motivated by these insights, we propose a simple yet effective information aggregation module designed to merge the representations of subsets and supersets from a permutation invariance perspective. Comprehensive empirical evaluations across diverse tasks and datasets validate the enhanced efficacy of our approach over conventional methods, underscoring the practicality and potency of our proposed strategies in real-world contexts.
Related papers
- Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation [22.87577374767465]
We reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets.
In this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy.
Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search.
arXiv Detail & Related papers (2024-04-26T05:01:08Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies [4.913248451323163]
We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
arXiv Detail & Related papers (2022-09-29T03:36:56Z) - Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation.
A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization.
Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z) - Exploring Multi-dimensional Data via Subset Embedding [13.092303047029311]
We propose a visual analytics approach to exploring subset patterns.
The core of the approach is a subset embedding network (SEN) that represents a group of subsets as uniformly-formatted embeddings.
The design enables to handle arbitrary subsets and capture the similarity of subsets on single features.
arXiv Detail & Related papers (2021-04-24T03:08:08Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Learn to Predict Sets Using Feed-Forward Neural Networks [63.91494644881925]
This paper addresses the task of set prediction using deep feed-forward neural networks.
We present a novel approach for learning to predict sets with unknown permutation and cardinality.
We demonstrate the validity of our set formulations on relevant vision problems.
arXiv Detail & Related papers (2020-01-30T01:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.